Re: Returned post for annou...@apache.org

2021-02-11 Thread Erik Huelsmann
On Thu, Feb 11, 2021, 14:36 Private List Moderation <
mod-priv...@gsuite.cloud.apache.org> wrote:

> On Thu, 11 Feb 2021 at 12:15, Branko Čibej  wrote:
>
>> On 11.02.2021 12:23, Stefan Sperling wrote:
>>
>> On Thu, Feb 11, 2021 at 11:02:32AM +, Private List Moderation wrote:
>>
>> Irrelevant.
>>
>> Given that this discussion doesn't seem to be going anywhere and the
>> same arguments from May 2020 are just being rehashed, I guess we will
>> simply stop using the announce@ mailing list.
>>
>>
>> I agree. This nitpicking bureaucratic mission creep has gone way over the
>> top. We have our own announce@svn.a.o list anyway; I expect anyone who's
>> really interested is subscribed to that.
>>
>> I find it kind of ironically funny that the same moderator(s) who feel
>> they're empowered to enforce release policy don't feel that the normal
>> escalation path (i.e., bug report to dev@) is worth taking.
>>
>>
> There was a problem with the download page at the time it was checked.
>

What does a problem with the download page have to do with spam prevention?
Why does that problem make this spam?


Please try to see it from the moderator's point of view.
>


I can only look at it from my what I perceive to be the responsibility of a
moderator. And I am looking at it from that perspective.

Erik


Re: Returned post for annou...@apache.org

2021-02-10 Thread Erik Huelsmann
How can a link be more important than an announcement for a fix of an
*unauthenticated* remote DoS ?

Same for the KEYS file???

Don't you think that's way out of proportion?


Erik.

On Wed, Feb 10, 2021 at 4:50 PM Private List Moderation
 wrote:
>
> I don't see how the missing links can be regarded as trivial.
> This obviously needs to be fixed before the announce can be accepted.
>
> At the same time, I asked for the KEYS file link to be standardised.
> There is already a KEYS file at the standard location - why not link to that 
> instead?
>
>
> On Wed, 10 Feb 2021 at 15:35, Stefan Sperling  wrote:
>>
>> Sebb, blocking our release announcements over trivialities like this
>> really is not a nice thing to do. Last time it happened in May 2020.
>> It was already discussed back then and raised with the announce@
>> moderation team.
>>
>> The Subversion PMC came to the conclusion that our handling of
>> the KEYS files is adequate for our purposes:
>> https://svn.haxx.se/dev/archive-2020-05/0156.shtml
>>
>> Please raise the issue on our dev@subversion.a.o list if it bothers you.
>> The moderation mechanism is supposed to prevent spam. Using it to enforce
>> release workflow policies amounts to misuse of your moderation privileges.
>>
>> Regards,
>> Stefan
>>
>> On Wed, Feb 10, 2021 at 03:20:41PM -, announce-ow...@apache.org wrote:
>> >
>> > Hi! This is the ezmlm program. I'm managing the
>> > annou...@apache.org mailing list.
>> >
>> > I'm working for my owner, who can be reached
>> > at announce-ow...@apache.org.
>> >
>> > I'm sorry, your message (enclosed) was not accepted by the moderator.
>> > If the moderator has made any comments, they are shown below.
>> >
>> > >  >
>> > Sorry, but the announce cannot be accepted.
>> > The linked download page does not contain links for the version in the
>> > email.
>> >
>> > Also, the standard name for the KEYS file is KEYS - no prefix, no suffix.
>> > Please correct the download page, check it, and submit a corrected announce
>> > mail.
>> >
>> > Thanks,
>> > Sebb.
>> > <  <
>> >
>>
>> > Date: Wed, 10 Feb 2021 14:37:00 +0100
>> > From: Stefan Sperling 
>> > To: annou...@subversion.apache.org, us...@subversion.apache.org,
>> >  dev@subversion.apache.org, annou...@apache.org
>> > Cc: secur...@apache.org, oss-secur...@lists.openwall.com,
>> >  bugt...@securityfocus.com
>> > Subject: [SECURITY][ANNOUNCE] Apache Subversion 1.10.7 released
>> > Message-ID: 
>> > Reply-To: us...@subversion.apache.org
>> > Content-Type: text/plain; charset=utf-8
>> >
>> > I'm happy to announce the release of Apache Subversion 1.10.7.
>> > Please choose the mirror closest to you by visiting:
>> >
>> > https://subversion.apache.org/download.cgi#supported-releases
>> >
>> > This is a stable bugfix and security release of the Apache Subversion
>> > open source version control system.
>> >
>> > THIS RELEASE CONTAINS AN IMPORTANT SECURITY FIX:
>> >
>> >   CVE-2020-17525
>> >   "Remote unauthenticated denial-of-service in Subversion mod_authz_svn"
>> >
>> > The full security advisory for CVE-2020-17525 is available at:
>> >   https://subversion.apache.org/security/CVE-2020-17525-advisory.txt
>> >
>> > A brief summary of this advisory follows:
>> >
>> >   Subversion's mod_authz_svn module will crash if the server is using
>> >   in-repository authz rules with the AuthzSVNReposRelativeAccessFile
>> >   option and a client sends a request for a non-existing repository URL.
>> >
>> >   This can lead to disruption for users of the service.
>> >
>> >   We recommend all users to upgrade to the 1.10.7 or 1.14.1 release
>> >   of the Subversion mod_dav_svn server.
>> >
>> >   As a workaround, the use of in-repository authz rules files with
>> >   the AuthzSVNReposRelativeAccessFile can be avoided by switching
>> >   to an alternative configuration which fetches an authz rules file
>> >   from the server's filesystem, rather than from an SVN repository.
>> >
>> >   This issue was reported by Thomas Åkesson.
>> >
>> > SHA-512 checksums are available at:
>> >
>> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.bz2.sha512
>> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.gz.sha512
>> > https://www.apache.org/dist/subversion/subversion-1.10.7.zip.sha512
>> >
>> > PGP Signatures are available at:
>> >
>> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.bz2.asc
>> > https://www.apache.org/dist/subversion/subversion-1.10.7.tar.gz.asc
>> > https://www.apache.org/dist/subversion/subversion-1.10.7.zip.asc
>> >
>> > For this release, the following people have provided PGP signatures:
>> >
>> >Stefan Sperling [2048R/4F7DBAA99A59B973] with fingerprint:
>> > 8BC4 DAE0 C5A4 D65F 4044  0107 4F7D BAA9 9A59 B973
>> >Branko Čibej [4096R/1BCA6586A347943F] with fingerprint:
>> > BA3C 15B1 337C F0FB 222B  D41A 1BCA 6586 A347 943F
>> >Johan Corveleyn [4096R/B59CE6D6010C8AAD] wit

Re: svn commit: r1854072 - in /subversion/trunk/subversion: libsvn_subr/io.c tests/libsvn_subr/io-test.c

2019-02-22 Thread Erik Huelsmann
>
> By the way, I'm not sure why we carry around the "defined(__OS2__)"
> check in io.c. As far as I'm aware, no-one has ever actually tested
> Subversion on OS/2 ... these checks are probably just lifted out of APR,
> but don't do anything useful.
>

Maybe not tested, but there are supposedly floating OS/2 binaries around:
https://os2ports.smedley.id.au/index.php?page=subversion

Lacking an OS/2 installation, I have no idea if they actually work...


>
> -- Brane
>
>

Regards,


-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: Migrating Subversion issues to ...

2015-09-28 Thread Erik Huelsmann
> > Hi Mark,
> >
> > I'm going to start migration process tomorrow morning. Could you
> > please lock tigris.org project? I think it will ok if our issue
> > tracker will read-only for day or something.
> >
> >
> Issues are finally migrated to ASF JIRA:
> https://issues.apache.org/jira/browse/SVN
>
>
Great! Thanks so much!

Regards,



-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: AW: Convenient array & hash iterators & accessors

2015-03-13 Thread Erik Huelsmann
> > Heh :-) I meant the branch-specific code -- not *all* of the client and
> > library! I have no idea what that means, because I didn't study the code
> > closely (yet). I'll need some directions on where to look for the
> > branch-specific code so I can try to figure out where to hook Lua in.
>
> Oh, so you you want to try it?


Well, my idea would be that if we're able to address the code
additions/changes in that branch with an integration design, then it
fulfills the requirement you were talking about at the beginning of this
thread. If it doesn't work, then it might not be the solution we're looking
for.


> OK, the new code is pretty well
> segrated from the existing code. Almost all of the relevant code is in
> these new files:
>
> subversion/svnmover/svnmover.c
> subversion/libsvn_delta/{element,branch,editor3e,compat3e}.c
>


Thanks! I'll have a look.

-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: AW: Convenient array & hash iterators & accessors

2015-03-13 Thread Erik Huelsmann
> > Am I right that if we were to run this experiment with the
> > move-tracking-2 branch code, that the entire client and library would
> > be subject to conversion to the higher level language?
>
> No! That would be literally years of rewriting and debugging and
> re-testing, not to mention interesting interfacing with the rest of the
> (pool-bound) code.
>

Heh :-) I meant the branch-specific code -- not *all* of the client and
library! I have no idea what that means, because I didn't study the code
closely (yet). I'll need some directions on where to look for the
branch-specific code so I can try to figure out where to hook Lua in.

-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: AW: Convenient array & hash iterators & accessors

2015-03-13 Thread Erik Huelsmann
> > @Julian, do you have a specific area of our code that would most benefit
> > from "moving 'up' from C"? Preferably some part of code that's currently
> > very much in flux?
>
> 'svnmover' on the 'move-tracking-2' branch. It includes both 'client'
> and 'library' code, and I'm moving code freely between the two as I
> figure out what is the best layering. So it's important that a
> language would be good in both roles.
>

Well, Lua supports calling both ways. A call isn't a straight C call,
though (in Lua, it's a straight Lua function invocation), but a call that
follows a certain calling protocol. Going from Lua to pure C or pure C to
Lua requires a bit of glue code much like sqlite does for its parameter
bindings.

Am I right that if we were to run this experiment with the move-tracking-2
branch code, that the entire client and library would be subject to
conversion to the higher level language?

-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: AW: Convenient array & hash iterators & accessors

2015-03-13 Thread Erik Huelsmann
> Hence my suggestion for Lua, which doesn't have a GIL, as far as I can
>> find. Nor does it need manual reference-keeping like is needed with Python
>> or Perl. With Lua you can have as many evaluation environments as you want.
>> Instantiating them when crossing a certain API boundary to be used by the
>> library internals.
>>
>
> I don't have direct experience with Lua, but have read/observed it for
> many years. This is something that I could get behind as an embedded
> *experimental* solution (to move "up" from lower-level C code), based on
> what I've read.
>

That would be the first step for any implementation, I take it -- we'd want
to evaluate the benefits to be had. If we agree to start experimenting with
Lua, the next step would be to create a high level design. Something I
might be able to spend time on.

@Julian, do you have a specific area of our code that would most benefit
from "moving 'up' from C"? Preferably some part of code that's currently
very much in flux?

-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: AW: Convenient array & hash iterators & accessors

2015-03-13 Thread Erik Huelsmann
> > These days, I suppose we'd be looking at something like Go, which can
> > be linked with C/C++ and also natively export functions that can be
> > called from C/C++.
>
> As far as I can see, Go always comes with Garbage Collection instead of a
> deterministic memory management.
>
> Also, as far as I can see, Go does not go as far as Rust with what the
> compiler can check at compiletime.
>

On the other hand, I see on rust-lang that the current state of Rust is
1.0.0-alpha2 where Lua has 22 years of experience and development.

-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: AW: Convenient array & hash iterators & accessors

2015-03-13 Thread Erik Huelsmann
> In the past I'd  thought about embedding Python into our sources, but
> Python still (after 20 years ...) depends on a global interpreter lock
> which pretty much kills any chance of lockless thread-safe code.
>

Hence my suggestion for Lua, which doesn't have a GIL, as far as I can
find. Nor does it need manual reference-keeping like is needed with Python
or Perl. With Lua you can have as many evaluation environments as you want.
Instantiating them when crossing a certain API boundary to be used by the
library internals.

These days, I suppose we'd be looking at something like Go, which can be
> linked with C/C++ and also natively export functions that can be called
> from C/C++.
>

Do you mean that the code one writes is exported as a C function? Or that
there's a C interface (the latter isn't better than e.g. Lua, Python and
Perl, so I assume you mean the former?)

Would it be an idea, if we really want this, to come up with a list of
requirements and nice-to-haves against which each of the languages brought
up should be measured? If we don't do that, we probably go on another 10
years with C only (and another 10... and anothor 10...)


-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: Convenient array & hash iterators & accessors

2015-03-06 Thread Erik Huelsmann
> > It would make sense to design type-safe, light-weight container and
> > iterator template wrappers around the APR structures if we decided to
> > write code in C++. Since we're not, "explicit is better than
> > implicit".
>
> I understand the point. I note that "explicit" is not a binary quality:
> there are degrees of it.
>
> I suppose I want to be writing in a higher level language. Maybe I should
> just go ahead and really do so.


Exactly. There's been talk about doing so for much too long without action
(other than attempts - including my own) to find a way to "upgrade" C to
something less verbose and more expressive.

I've been long thinking that there are specific areas which are
more-or-less stand-alone, might be a good place to start this strategy. One
place like that might qualify is the piece of code that deduces the
eligeable revisions in merge tracking. That's the code I'm thinking you're
now working in?

What kind of language were you thinking about? One of the languages that
came to mind is 'lua' which seems to have a pretty strong focus on being
integratable with C code. For lua there are also tools to embed the
compiled bytecode in a C library so the entire higherlevel language can be
fully encapsulated inside our libraries.



-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: Configuring Subversion with Berkeley DB Error: configure: error: Berkeley DB 4.0.14 or 5.x wasn't found

2015-02-12 Thread Erik Huelsmann
Hi kay,

On Thu, Feb 12, 2015 at 5:48 PM, kay  wrote:

> Just to clarify the "support security" was a typo. I meant they thought BDB
> will have better features for user authentication, privacy, permission and
> security issues.


Well, then I think your customer, or you, don't get Branko's "There's
*nothing* BDB does that FSFS can't do (better)." Because there's really
nothing that's better supported with BDB. That includes authentication,
privacy, permissions and security.


> I brought up the issue of deprecation and lack of future
> support for BDB.
>



-- 
Bye,

Erik.

http://efficito.com -- Hosted accounting and ERP.
Robust and Flexible. No vendor lock-in.


Re: [svnbench] Revision: 1507876 compiled Jul 29 2013, 00:21:55 on x86_64-unknown-linux-gnu

2013-07-30 Thread Erik Huelsmann
Hi Neels,

Would it be an idea to switch the baseline of the tests to 1.8.1? I
regularly look at them, but got confused with the reported performance gain.

Just to let you know :-)

Erik.

sent from my phone
On Jul 29, 2013 2:38 AM,  wrote:

> 1.7.0@1181106 vs. trunk@1507860
> Started at Mon Jul 29 00:26:13 UTC 2013
>
> *DISCLAIMER* - This tests only file://-URL access on a GNU/Linux VM.
> This is intended to measure changes in performance of the local working
> copy layer, *only*. These results are *not* generally true for everyone.
>
> Charts of this data are available at http://svn-qavm.apache.org/charts/
>
> Averaged-total results across all runs:
> ---
>
> Compare trunk@1507860 to 1.7.0
>Navg operation
>  51/90.54|-34.946   TOTAL RUN
>3K/5301.23| +0.005   add
>102/180.76| -0.205   checkout
>408/720.63| -0.741   commit
>  51/90.86| -0.003   copy
>  51/90.76| -0.070   delete
>255/450.12| -3.828   info
>102/180.52| -1.016   merge
>2K/5160.84| -0.002   mkdir
>136/210.92| -0.001   propdel
>38K/6K0.73| -0.003   proplist
>38K/6K0.75| -0.003   propset
>3K/5910.77| -0.003   ps
>102/181.92| +0.009   resolve
>102/180.81| -0.038   resolved
>   714/1260.71| -0.052   status
>  51/90.70| -0.326   switch
>   714/1260.77| -0.157   update
> (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds;
>  factor < 1 and seconds < 0 means 'trunk@1507860' is faster.
>  "2/3" means: '1.7.0' has 2 timings on record, the other has 3.)
>
>
> Above totals split into separate x runs:
> 
>
> Compare trunk@1507860,5x5 to 1.7.0,5x5
>Navg operation
>  17/30.54|-95.838   TOTAL RUN
>2K/4561.25| +0.005   add
>  34/60.78| -0.499   checkout
>136/240.64| -1.900   commit
>  17/30.80| -0.004   copy
>  17/30.78| -0.162   delete
> 85/150.11|-11.319   info
>  34/60.54| -2.567   merge
>2K/4700.83| -0.002   mkdir
>136/200.91| -0.001   propdel
>35K/6K0.74| -0.002   proplist
>36K/6K0.76| -0.003   propset
>2K/5520.77| -0.002   ps
>  34/63.77| +0.024   resolve
>  34/60.80| -0.102   resolved
>238/420.72| -0.125   status
>  17/30.73| -0.755   switch
>238/420.82| -0.300   update
> (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds;
>  factor < 1 and seconds < 0 means 'trunk@1507860,5x5' is faster.
>  "2/3" means: '1.7.0,5x5' has 2 timings on record, the other has 3.)
>
> Compare trunk@1507860,100x1 to 1.7.0,100x1
>Navg operation
>  17/30.55| -7.301   TOTAL RUN
>476/710.98| -0.000   add
>  34/60.57| -0.083   checkout
>136/240.50| -0.254   commit
>  17/30.90| -0.002   copy
>  17/30.67| -0.039   delete
> 85/150.47| -0.144   info
>  34/60.35| -0.378   merge
>238/460.89| -0.002   mkdir
>1K/3370.62| -0.004   proplist
>1K/2730.65| -0.005   propset
>119/330.66| -0.005   ps
>  34/61.32| +0.003   resolve
>  34/60.91| -0.006   resolved
>238/420.68| -0.024   status
>  17/30.49| -0.185   switch
>238/420.50| -0.151   update
> (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds;
>  factor < 1 and seconds < 0 means 'trunk@1507860,100x1' is faster.
>  "2/3" means: '1.7.0,100x1' has 2 timings on record, the other has 3.)
>
> Compare trunk@1507860,1x100 to 1.7.0,1x100
>Navg operation
>  17/30.61| -1.698   TOTAL RUN
>  17/31.80| +0.042   add
>  34/60.63| -0.033   checkout
>136/240.56| -0.070   commit
>  17/30.89| -0.002   copy
>  17/30.66| -0.010   delete
> 85/150.71| -0.020   info
>  34/60.41| -0.102   merge
>   629/1110.59| -0.004   proplist
>   714/1260.61| -0.005   propset
>  34/60.63| -0.004   ps
>  34/60.93| -0.001   resolve
>  34/60.67| -0.007   resolved
>238/420.62| -0.008   status
>  17/30.56| -0.037   switch
>238/420.65| -0.019   update
> (legend: "1.23|+0.45" means: slower by factor 1.23 and by 0.45 seconds;
>  factor < 1 and seconds < 0 means 'trunk@1507860,1x100' is faster.
>  "2/3" means: '1.7.0,1x100' has 2 timings on record, the other has 3.)
>
>
>
> More detail:
> 
>
> Timings for 1.7.0,5x5
>Nmin max avg   operation  (unit is seconds)
>   17  192.14  255.84  207.06  TOTAL RUN
>   2K0.012.210.02  add
>   340.025.152.29  checkout
>  1361.07   17.295.34  commit
>   170.010.130.02  copy
>   170.610.960.73  delete
>   856.32   31.69   12.70  info
>   345.318.495.60  merge
>   2K0.010.04

Re: svnadmin upgrade output message i18n issue

2013-05-23 Thread Erik Huelsmann
One application has multiple active code page settings on Windows. Or
course if your example was the only option, we would not be having this
discussion.

Bye,

Erik.

sent from my phone
On May 23, 2013 6:44 PM, "Dongsheng Song"  wrote:

> On Thu, May 23, 2013 at 11:38 PM, Erik Huelsmann  wrote:
> > That was not my point nor the point we discussed back then. As long as
> > gettext tries to convert its translations to *any* encoding, it's flawed
> by
> > design, because some systems have multiple active output encodings (e.g.
> > Windows).
> >
>
> This does not matter. If I open 2 console window, one is CP437, the
> other is CP936. Then svn in CP437 windows generate English (ASCII)
> output, CP936 windows generate Chinese (GBK/GB18030) output.
>


Re: svnadmin upgrade output message i18n issue

2013-05-23 Thread Erik Huelsmann
Found at least one of the related discussions:

http://svn.haxx.se/dev/archive-2004-05/0078.shtml

bye,

Erik.
On May 23, 2013 5:38 PM, "Erik Huelsmann"  wrote:

>
> > >
> > >> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
> > >> messages, write it ***AS IS***, since GETTEXT(3)  already do the
> > >> correct conversion for us.
> > >
> > > Well, even though gettext may want us to believe otherwise, this
> doesn't
> > > work for cross platform applications: e.g. in windows the locale for
> output
> > > on the console may be different from the locale for other uses. Back
> when we
> > > went with gettext (2004?), we've hashed this through pretty
> thoroughly. I
> > > hope that discussion is still available in the archives.
> > >
> >
> > As I said in the first email of this thread, gettext 0.18.2 and 0.14.1
> > give me the different behavior, it seems that gettext 0.14.1 do not do
> > the correct thing. But do we still need support this OLD and BUGGY
> > version ?
>
> That was not my point nor the point we discussed back then. As long as
> gettext tries to convert its translations to *any* encoding, it's flawed by
> design, because some systems have multiple active output encodings (e.g.
> Windows).
>
> Unless this design has changed between 0.14 and 0.18, gettext() is still
> as broken as it was. Translating or not translating doesn't matter: it'll
> just be broken on other systems. Too bad the rest of it is actually pretty
> good.
>
> Bye,
>
> Erik.
>


Re: svnadmin upgrade output message i18n issue

2013-05-23 Thread Erik Huelsmann
> >
> >> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
> >> messages, write it ***AS IS***, since GETTEXT(3)  already do the
> >> correct conversion for us.
> >
> > Well, even though gettext may want us to believe otherwise, this doesn't
> > work for cross platform applications: e.g. in windows the locale for
output
> > on the console may be different from the locale for other uses. Back
when we
> > went with gettext (2004?), we've hashed this through pretty thoroughly.
I
> > hope that discussion is still available in the archives.
> >
>
> As I said in the first email of this thread, gettext 0.18.2 and 0.14.1
> give me the different behavior, it seems that gettext 0.14.1 do not do
> the correct thing. But do we still need support this OLD and BUGGY
> version ?

That was not my point nor the point we discussed back then. As long as
gettext tries to convert its translations to *any* encoding, it's flawed by
design, because some systems have multiple active output encodings (e.g.
Windows).

Unless this design has changed between 0.14 and 0.18, gettext() is still as
broken as it was. Translating or not translating doesn't matter: it'll just
be broken on other systems. Too bad the rest of it is actually pretty good.

Bye,

Erik.


Re: svnadmin upgrade output message i18n issue

2013-05-23 Thread Erik Huelsmann
sent from my phone
On May 23, 2013 4:43 PM, "Dongsheng Song"  wrote:
>
> On Thu, May 23, 2013 at 10:06 PM, Philip Martin
>  wrote:
> > Dongsheng Song  writes:
> >
> >> On Thu, May 23, 2013 at 9:28 PM, Philip Martin
> >>  wrote:
> >>> Dongsheng Song  writes:
> >>>
>  On Thu, May 23, 2013 at 9:11 PM, Philip Martin
>   wrote:
> > Philip Martin  writes:
> >
> >> So it appears the UTF8 to native conversion is missing from
> >> repos_notify_handler.  I think repos_notify_handler should be using
> >> svn_stream_printf_from_utf8 rather than svn_stream_printf.
> >
> > I've fixed trunk to use svn_cmdline_cstring_from_utf8 and proposed
it
> > for 1.8.
> >
> 
>  As GETTEXT(3) man pages said, If and only if
>  defined(HAVE_BIND_TEXTDOMAIN_CODESET),
>  your commit is OK.
> 
>  So you should check HAVE_BIND_TEXTDOMAIN_CODESET when you use
>  svn_cmdline_cstring_from_utf8.
> >>>
> >>> Are you saying there is a problem with my change?  If there is a
problem
> >>> doesn't already apply to all other uses of
svn_cmdline_cstring_from_utf8?
> >>>
> >>
> >> I thinks so. In the subversion/libsvn_subr/nls.c file:
> >>
> >> #ifdef HAVE_BIND_TEXTDOMAIN_CODESET
> >>   bind_textdomain_codeset(PACKAGE_NAME, "UTF-8");
> >> #endif /* HAVE_BIND_TEXTDOMAIN_CODESET */
> >>
> >> bind_textdomain_codeset only called when HAVE_BIND_TEXTDOMAIN_CODESET
> >> defined. In this case, you can assume GETTEXT(3) returned string is
> >> UTF-8 encoded.
> >
> > I still don't understand if you are claiming my change has a problem or
> > if there is a problem in all uses of svn_cmdline_cstring_from_utf8.
> >
> > I recall a related thread from last year:
> >
> > http://svn.haxx.se/dev/archive-2012-08/index.shtml#34
> >
http://mail-archives.apache.org/mod_mbox/subversion-dev/201208.mbox/%3Cop.wilcelggnngjn5@tortoise%3E
> >
> > I think we assume that the translations are UTF-8.
> >
> > Is there some code change you think we should make?
> >
>
> Even ALL the translations are UTF-8, GETTEXT(3) still return the
> string encoded by the ***current locale's codeset***.
>
>  Here is sniped from the GETTEXT(3) man pages:
>
> In both cases, the functions also use the LC_CTYPE locale facet  in
> order  to  convert  the translated message from the translator's
> codeset to the ***current locale's codeset***, unless overridden by a
> prior call to the bind_textdomain_codeset function.
>
> So svn_cmdline_printf SHOULD NOT assume the input string is UTF-8
> coded, it it encoded to the ***current locale's codeset***.

But we call the codeset function to make sure we do not generate output in
the current locale encoding.

> I think the best solution is: DO NOTconvert the GETTEXT(3) returned
> messages, write it ***AS IS***, since GETTEXT(3)  already do the
> correct conversion for us.

Well, even though gettext may want us to believe otherwise, this doesn't
work for cross platform applications: e.g. in windows the locale for output
on the console may be different from the locale for other uses. Back when
we went with gettext (2004?), we've hashed this through pretty thoroughly.
I hope that discussion is still available in the archives.

Bye,

Erik.


Re: Compressed Pristines (Design Doc)

2012-03-22 Thread Erik Huelsmann
Hi Ash,

Thanks for picking up the initiative to implement this feature.

On Thu, Mar 22, 2012 at 7:01 PM, Ivan Zhakov  wrote:

> On Thu, Mar 22, 2012 at 18:30, Daniel Shahaf  wrote:
> > OK, I've had a cruise through now.
> >
> > First of all I have to say it's an order of magnitude larger than what
> > I'd imagined it would be.  That makes the "move it elsewhere" idea I'd
> > had less practical than I'd predicted.  I'm also not intending to take
> > you up on your offer to proxy me to the doc, though thanks for making it.
> >
> > Design-wise I'm a bit surprised that the choice ended up being rolling
> > a custom file format.
> >
> > Thanks for your work.
> >
> +1. I believe we should implement compressed pristine in simple way:
> just compress pristine files itself, without inventing some new
> format.


As the others, I'm surprised we seem to be going with a custom file format.
You claim source files are generally small in size and hence only small
benefits can be had from compressing them, if at all, due to the fact that
they would be of sub-block size already.

To substantiate that claim, I took the pristines directory from my
Subversion working copy and did some experimenting. See results  below:

 $ ls -ls uncompressed-pristines/*/*.svn-base | awk '{ tot += $1; } END {
print "total size " tot; }'
total size: 188724

 $ cp -Rp uncompressed-pristines/ compressed-pristines
 $ gzip compressed-pristines/*/*.svn-base
 $ ls -ls compressed-pristines/*/*.svn-base.gz | awk '{ tot += $1; } END {
print "total size " tot; }'
total size: 52320

 $ cat compressed-pristines/*/*.svn-base.gz > combined-compressed-file
 $ ls -ls combined-compressed-file
41812 


So, if I look at the Subversion pristines in my working copy, the reduction
in allocated blocks goes from 100% to 27%. To be honest, I doubt the
complexity we'll be importing just to reduce the allocated number of blocks
from 27% to 22% is really worth it: the savings are already tremendous.
Won't the creation of a custom storage format just serve to destabilize our
working copy?


Do you have data which triggered you to design this custom format?


Bye,


Erik.


Fwd: Compressed Pristines

2012-03-12 Thread Erik Huelsmann
Forwarding my response back to the list...

-- Forwarded message --
From: Erik Huelsmann 
Date: Mon, Mar 12, 2012 at 4:14 PM
Subject: Re: Compressed Pristines
To: Johan Corveleyn 


Hi Johan,

> Has nothing to do with the property. The pristine matches the repository,
> > byte for byte. The file installed in the working copy is affected by the
> > property; not the pristine.
>
> Yes, the pristine matches the repository. But what I mean is:
>
> (on Windows):
> $ create file-with-crlf.txt
> $ svn add file-with-crlf.txt
> $ svn ps svn:eol-style native file-with-crlf.txt
> $ svn commit -mm file-with-crlf.txt
>
> -> pristine file is LF-terminated (as is the file in the repos, as you
> point out).
>

This is correct: line endings get normalized to LF when svn:eol-style
'native' is applied.


> $ create file-with-crlf.txt
> $ svn add file-with-crlf.txt
> $ svn commit -mm file-with-crlf.txt
>
> -> pristine file CRLF-terminated.
>

This is correct: file doesn't have any transformation applied: we preserve
the input file.


> $ create file-with-crlf.txt
> $ svn add file-with-crlf.txt
> $ svn ps svn:eol-style CRLF file-with-crlf.txt
> $ svn commit -mm file-with-crlf.txt
>
> -> pristine file CRLF-terminated.
>

This is correct: it's the normal form for files with CRLF applied (before
you ask: files with CR line ending normalization get transformed to CR
only).


> $ create file-with-crlf.txt
> $ svn add file-with-crlf.txt
> $ svn ps svn:eol-style LF file-with-crlf.txt
> $ svn commit -mm file-with-crlf.txt
>
> -> pristine file is LF-terminated (as is the working-copy file).


Exactly. So what you found is that for any eol style other than native, we
use exactly that style. For native, we use LF.


HTH,

Erik.


Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Erik Huelsmann
On Thu, Feb 2, 2012 at 10:59 PM, Hiroaki Nakamura  wrote:
> 2012/2/3 Peter Samuelson :
>>
>>> On 02.02.2012 20:22, Peter Samuelson wrote:
>>> > By proposing a client-only solution, I hope to avoid _all_ those
>>> > questions.
>>
>> [Branko Cibej]
>>> Can't see how that works, unless you either make the client-side
>>> solution optional, create a mapping table, or make name lookup on the
>>> server agnostic to character representation.
>>
>> Yes, I did propose a mapping table in wc.db.
>>
>> Old clients on OS X would continue to be confused; the solution is to
>> upgrade.
>
> Until upgrading all clients, there are possibilities that NFD filenames
> are checked in to repositories. So I proposed servers change filenames
> to NFC before checking in to repositories.

How about checking existence of a path to be added using NFC encoding?
If it does not exist when both the repository paths and the new
path(s) are converted to NFC, go ahead and add it using the encoding
that you were handed off the network?

Bye,


Erik


Re: Apache, Subversion hooks, and locales

2012-01-31 Thread Erik Huelsmann
> Given that httpd is avoiding setlocale() we're pretty much left without
> locale support in mod_dav_svn.

Beware that you don't depend on setlocale() not having been called
though: at least one of the popular mod_ modules
*does* use setlocale(). (I think it was php5.)

Other than that: completely agreed.


Bye,

Erik.


Re: Does fsfs revprop packing no longer allow usage of traditional backup software?

2011-06-30 Thread Erik Huelsmann
Hi Hyrum,

On Thu, Jun 30, 2011 at 11:33 PM, Hyrum K Wright wrote:

> On Thu, Jun 30, 2011 at 3:27 PM, Peter Samuelson  wrote:
> >
> > [Ivan Zhakov]
> >> It should be easy to implement editing revprops without using SQLite:
> >> in case someone modify revprop non-packed revprop file is created, in
> >> read operation non-packed revprop file should be considered as more
> >> up-to-date. In next svnadmin pack operation these non-packed files
> >> should be merged back to packed one.
> >
> > +1.  This would basically mean there's only _one_ code path for writing
> > revprops, yes?  'svnadmin pack' gets a little more complex, but the
> > rest of libsvn_fs_fs gets simpler.
> >
> > Anyone have time to actually do this?  Converting the packed format
> > from sqlite to the same format used for packed revs would be a bonus.
>
> I like this idea, but it would seem to introduce an additional stat()
> call* for every attempt to fetch a revprop, because you'd first have
> to check the "old" location, and then the packed one.  As far as I can
> see, you'd have to do this in every case; in other words, there isn't
> a single-stat() short cut for the common case of non-edited revprops.
>
> -Hyrum
>
> * - I don't know why we seem to have this obsession with stat() calls
> around here, but it appears to have rubbed off on me.
>

Well, we've been able to increase working copy performance throughout the
lifetime of libsvn_wc-1 by working out ways to reduce the number of
apr_stat() calls. I'm not aware there's a huge reason to do that on the
server side though

Bye,


Erik.


Re: svn bisect

2011-06-21 Thread Erik Huelsmann
Hi Arwin,

On Tue, Jun 21, 2011 at 9:45 AM, Arwin Arni  wrote:

> 3. Will this feature be considered at all (if it is any good) or am I simply
> doing something to exercise my brain cells?

Actually, I think it'd be a good idea to have a standardized command
to have where all clients work alike.

What I think this command may also need is a list of revs to exclude
from bisection, for example because they're known to fail to compile.
This has been a wish for me. The scripts currently available either
don't use such a feature or don't use a ubiquitously understood value.


So, I'm all in favor of adding bisect one way or another.



Bye,


Erik.'


[PATCH] Align mod_dav_svn behaviour with ViewVC: resolve symlinks in SVNParentPath

2011-06-05 Thread Erik Huelsmann
One of my after-hours activities is to help maintain a community
hosting site for Common Lisp development.

During our latest system migration, I noticed that mod_dav_svn acts
weird in view of symlinks:

If you check http://svn.common-lisp.net/, the repository listing page
is empty. However, if you go to http://svn.common-lisp.net/armedbear,
you'll find that a repository is being listed. The repository is a
symlink in the parent path. This works great for ViewVC and also for
hosting the actual repositories, but it doesn't work out for listing
the available repositories.

The patch below fixes that, but I don't know if there are explicit
considerations to have the current behaviour, so I'm holding off my
commit for now.


Bye,


Erik.


[[[
Check node kind of resolved special nodes to be a directory, in order
to include symlinks pointing to directories.

* subversion/mod_dav_svn/repos.c
   (deliver): Extend 'is directory' check for inclusion
   in parent path listing to include symlinks-to-directory.
]]]




Index: subversion/mod_dav_svn/repos.c
===
--- subversion/mod_dav_svn/repos.c  (revision 1132467)
+++ subversion/mod_dav_svn/repos.c  (working copy)
@@ -3247,7 +3247,23 @@
   apr_hash_this(hi, &key, NULL, &val);
   dirent = val;

-  if (dirent->kind != svn_node_dir)
+  if (dirent->kind == svn_node_file && dirent->special)
+{
+  svn_node_kind_t resolved_kind;
+  const char *name = key;
+
+  serr = svn_io_check_resolved_path(name, &resolved_kind,
+resource->pool);
+  if (serr != NULL)
+return dav_svn__convert_err(serr,
+HTTP_INTERNAL_SERVER_ERROR,
+"couldn't fetch dirents "
+"of SVNParentPath",
+resource->pool);
+  if (resolved_kind != svn_node_dir)
+continue;
+}
+  else if (dirent->kind != svn_node_dir)
 continue;

   ent->name = key;


Re: Effect of indices on SQLite (optimizer) performance

2011-02-05 Thread Erik Huelsmann
On Sat, Feb 5, 2011 at 8:25 PM, Mark Phippard  wrote:
> On Sat, Feb 5, 2011 at 1:05 PM, Erik Huelsmann  wrote:
>
>> Scenario (2) takes ~0.27 seconds to evaluate in the unmodified
>> database. Adding an index on (wc_id, local_relpath) makes the
>> execution time drop to ~0.000156 seconds!
>>
>>
>> Seems Philip was right :-) We need to carefully review the indices we
>> have in our database to support good performance.
> I wish this document were fully fleshed out, it seems like it has some
> good info in it:
>
> http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
>
> Getting indexes in place for the bulk of our reads is essential.  It
> seems like now would be a good time to make that a priority.  Of
> course adding more indexes will further slow down write speed (which
> seems bad already) so maybe the above document will give ideas for
> other optimizations.
>
> Did anyone see the tests I posted on users@ of a checkout with 5000
> files in single folder?  I really thought we would be faster than 1.6
> already but we are actually several factors slower.
>
> My background is all with DB2 on OS/400.  Something I was looking for
> in SQLite docs is whether it uses hints for the number of rows in a
> table.  For example, DB2 optimizes a new table for 10,000 rows with
> increments of 1,000 when you reach the limit.  If you know you are
> inserting 100,000 rows you can get a massive performance improvement
> by telling DB2 to optimize for a larger size.  I was wondering if
> SQLite was doing something like optimizing for 100 rows or something
> small.  I noticed the end of the checkout is really slow which implies
> it does not insert the rows fast.  Maybe this is just an area where we
> need to use transactions better?

Their FAQ (http://www.sqlite.org/faq.html#q19) sure suggests that it's
not wise to do separate inserts: the document says SQLite easily does
50k inserts per sec into a table on moderate hardware, but only
roughly 60 transactions per second...

That would surely point into the direction of using transactions when
we need mass inserts! I'm not sure exactly where in our code these
inserts should be collected though. Maybe one of the WC-NG regulars
has an idea?


Bye,

Erik.


Re: Effect of indices on SQLite (optimizer) performance

2011-02-05 Thread Erik Huelsmann
Now attached as text files (to be renamed to .py) to prevent the
mailer software from dropping them...

Bye,

Erik.

On Sat, Feb 5, 2011 at 7:05 PM, Erik Huelsmann  wrote:
> Yesterday or IRC, Bert, Philip and I were chatting about our SQLite
> perf issues and how Philip's findings in the past suggested that
> SQLite wasn't using its indices to optimize our queries.
>
> After searching and discussing its documentation, Philip suggested the
> -too obvious- "maybe we have the wrong indices".
>
> So, I went to work with his "fake database generator script" (attached
> as "test.py").
>
>
> The type of query we're seeing problematic performance with looks like
> the one below. The essential part is the WHERE clause.
>
> SELECT * FROM nodes WHERE wc_id = 1 AND (local_relpath = 'foo' OR
> local_relpath like 'foo%');
>
>
> We discussed 3 ways to achieve the effect of this query:
>
>  1. The query itself
>  2. The query stated as a UNION of two queries
>  3. Running the two parts of the UNION manually ourselves.
>
> Ad (1)
> This query doesn't perform as we had hoped to get from using a database.
>
> Ad (2)
> In the past, UNIONs have been explicitly removed because they were
> creating temporary tables (on disk!). However, since then we have
> changed our SQLite setup to create temporary tables in memory, so the
> option should really be re-evaluated.
>
> Ad (3)
> I'd hate to have to use two queries in all places in our source where
> we want to run queries like these. As a result, I think this scenario
> should be avoided if we can.
>
>
> So, I've created 'perf.py' to evaluate each of these scenarios,
> researching the effect on each of them under the influence of adding
> different indices.
>
> This is my finding:
>
> Scenario (1) [an AND combined with a complex OR] doesn't perform well
> under any circumstance.
>
> Scenario (2) performs differently, depending on the available indices.
>
> Scenario (3) performs roughly equal to scenario (2).
>
>
> Scenario (2) takes ~0.27 seconds to evaluate in the unmodified
> database. Adding an index on (wc_id, local_relpath) makes the
> execution time drop to ~0.000156 seconds!
>
>
> Seems Philip was right :-) We need to carefully review the indices we
> have in our database to support good performance.
>
>
> Bye,
>
>
> Erik.
>
#!/usr/bin/python

import os, sqlite3, time

c = sqlite3.connect('wcx.db')
c.execute("""pragma case_sensitive_like=1""")
c.execute("""pragma foreign_keys=on""")
c.execute("""pragma synchronous=off""")
c.execute("""pragma temp_store=memory""")

start = time.clock() # cpu clock as float in secs

#c.execute("""drop index i_wc_id_rp;""")
#c.execute("""create index i_wc_id_rp on nodes (wc_id, local_relpath);""")

print c.execute(".indices")

# strategy 1
c.execute("""select * from nodes where wc_id = 1 AND
   (local_relpath like 'foo/%'
OR local_relpath = 'foo');""");


# strategy 2
#c.execute("""select * from nodes where wc_id = 1 AND local_relpath like 'foo/%'
# union select * from nodes where wc_id = 1 AND local_relpath = 
'foo';""")

# strategy 3
#c.execute("""select * from nodes where wc_id = 1 AND local_relpath like 
'foo/%';""")
#c.execute("""select * from nodes where wc_id = 1 AND local_relpath = 'foo';""")





end = time.clock()


print "timing: %5f\n" % (end - start)


#!/usr/bin/python

import os, sqlite3

try: os.remove('wcx.db')
except: pass

c = sqlite3.connect('wcx.db')
c.execute("""pragma case_sensitive_like=1""")
c.execute("""pragma foreign_keys=on""")
c.execute("""pragma synchronous=off""")
c.execute("""pragma temp_store=memory""")
c.execute("""create table repository (
   id integer primary key autoincrement,
   root text unique not null,
   uuid text not null)""")
c.execute("""create index i_uuid on repository (uuid)""")
c.execute("""create index i_root on repository (root)""")
c.execute("""create table wcroot (
   id integer primary key autoincrement,
   local_

Effect of indices on SQLite (optimizer) performance

2011-02-05 Thread Erik Huelsmann
Yesterday or IRC, Bert, Philip and I were chatting about our SQLite
perf issues and how Philip's findings in the past suggested that
SQLite wasn't using its indices to optimize our queries.

After searching and discussing its documentation, Philip suggested the
-too obvious- "maybe we have the wrong indices".

So, I went to work with his "fake database generator script" (attached
as "test.py").


The type of query we're seeing problematic performance with looks like
the one below. The essential part is the WHERE clause.

SELECT * FROM nodes WHERE wc_id = 1 AND (local_relpath = 'foo' OR
local_relpath like 'foo%');


We discussed 3 ways to achieve the effect of this query:

 1. The query itself
 2. The query stated as a UNION of two queries
 3. Running the two parts of the UNION manually ourselves.

Ad (1)
This query doesn't perform as we had hoped to get from using a database.

Ad (2)
In the past, UNIONs have been explicitly removed because they were
creating temporary tables (on disk!). However, since then we have
changed our SQLite setup to create temporary tables in memory, so the
option should really be re-evaluated.

Ad (3)
I'd hate to have to use two queries in all places in our source where
we want to run queries like these. As a result, I think this scenario
should be avoided if we can.


So, I've created 'perf.py' to evaluate each of these scenarios,
researching the effect on each of them under the influence of adding
different indices.

This is my finding:

Scenario (1) [an AND combined with a complex OR] doesn't perform well
under any circumstance.

Scenario (2) performs differently, depending on the available indices.

Scenario (3) performs roughly equal to scenario (2).


Scenario (2) takes ~0.27 seconds to evaluate in the unmodified
database. Adding an index on (wc_id, local_relpath) makes the
execution time drop to ~0.000156 seconds!


Seems Philip was right :-) We need to carefully review the indices we
have in our database to support good performance.


Bye,


Erik.


Re: svn commit: r1028092 - /subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c

2010-10-28 Thread Erik Huelsmann
Hi Hyrum,

On Wed, Oct 27, 2010 at 11:34 PM, Hyrum K. Wright
 wrote:
> On Wed, Oct 27, 2010 at 3:40 PM,   wrote:
>> Author: stefan2
>> Date: Wed Oct 27 20:40:53 2010
>> New Revision: 1028092
>>
>> URL: http://svn.apache.org/viewvc?rev=1028092&view=rev
>> Log:
>> Incorporate feedback I got on r985606.
>>
>> * subversion/libsvn_ra_svn/marshal.c
>>  (SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD): introduce symbolic name
>>  for an otherwise arbitrary number
>>  (read_long_string): fix docstring
>>  (read_string): use symbolic name and explain the rationale behind the 
>> special case
>>
>> Modified:
>>    subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c
>>
>> Modified: subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c
>> URL: 
>> http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c?rev=1028092&r1=1028091&r2=1028092&view=diff
>> ==
>> --- subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c 
>> (original)
>> +++ subversion/branches/performance/subversion/libsvn_ra_svn/marshal.c Wed 
>> Oct 27 20:40:53 2010
>> @@ -44,6 +44,12 @@
>>
>>  #define svn_iswhitespace(c) ((c) == ' ' || (c) == '\n')
>>
>> +/* If we receive data that *claims* to be followed by a very long string,
>> + * we should not trust that claim right away. But everything up to 1 MB
>> + * should be too small to be instrumental for a DOS attack. */
>> +
>> +#define SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD (0x10)
>
> I like the name!
>
>> +
>>  /* --- CONNECTION INITIALIZATION --- */
>>
>>  svn_ra_svn_conn_t *svn_ra_svn_create_conn2(apr_socket_t *sock,
>> @@ -555,9 +561,8 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
>>
>>  /* --- READING DATA ITEMS --- */
>>
>> -/* Read LEN bytes from CONN into already-allocated structure ITEM.
>> - * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string
>> - * data is allocated in POOL. */
>> +/* Read LEN bytes from CONN into a supposedly empty STRINGBUF.
>> + * POOL will be used for temporary allocations. */
>>  static svn_error_t *
>>  read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
>>                  svn_stringbuf_t *stringbuf, apr_uint64_t len)
>> @@ -593,7 +598,14 @@ static svn_error_t *read_string(svn_ra_s
>>                                 svn_ra_svn_item_t *item, apr_uint64_t len)
>>  {
>>   svn_stringbuf_t *stringbuf;
>> -  if (len > 0x10)
>> +
>> +  /* We should not use large strings in our protocol. However, we may
>> +   * receive a claim that a very long string is going to follow. In that
>> +   * case, we start small and wait for all that data to actually show up.
>> +   * This does not fully prevent DOS attacs but makes them harder (you
>> +   * have to actually send gigabytes of data).
>
> Wow, I hadn't even considered this.  Once we get this on trunk, it
> might make sense to propose a backport, since this has (potential?)
> security implications.


Actually, that was already released as a security vulnerability some
years ago. The comment by Stefan makes it painfully apparent that it
is, but I guess that's a good thing. Notice that he did nothing but
name the constant and add the explanation. See
http://subversion.apache.org/security/CAN-2004-0413-advisory.txt

This is exactly the point I was talking about  when I said properties
are length-limited by ra_svn. (In relation to the maximum size of
merge-tracking information.)  The actual code is a little bit
different than what I remembered., because it does seem to grow the
buffer once it gets past the first MiB.


Regards,

Erik.


Re: svn commit: r1026128 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

2010-10-22 Thread Erik Huelsmann
>> -  if (! replaced && status == svn_wc__db_status_added
>> +  if (reverted
>> +      && ! replaced
>> +      && status == svn_wc__db_status_added
>>        && db_kind == svn_wc__db_kind_dir)
>>      {
>> -      /* Non-replacements have their admin area deleted. wc-1.0 */
>> +      /* Non-replaced directories have their admin area deleted. wc-
>> 1.0 */
>>        SVN_ERR(svn_wc__adm_destroy(db, local_abspath,
>>                                    cancel_func, cancel_baton, pool));
>>      }
>>
>
> I don't think we need this block with single-db. There is no administrative 
> area to remove.

This call also destroys the adm-access which may be cached in the
db-handle. Removing that call makes one of our 'we should work with
our old entries code' tests fail. Maybe the comment should state
something to that effect?


Bye,

Erik.


Re: svn commit: r1026105 - /subversion/trunk/subversion/libsvn_wc/merge.c

2010-10-21 Thread Erik Huelsmann
Hi Stefan,

I see you're not on irc, so you may have missed it: This commit, or
the next, turned the buildslaves red.


Bye,

Erik.

On Thu, Oct 21, 2010 at 9:07 PM,   wrote:
> Author: stsp
> Date: Thu Oct 21 19:07:54 2010
> New Revision: 1026105
>
> URL: http://svn.apache.org/viewvc?rev=1026105&view=rev
> Log:
> * subversion/libsvn_wc/merge.c
>  (merge_text_file): Don't leak temporary file RESULT_TARGET.
>   E.g. when a text conflict happened during an update, and the user
>   chose 'theirs-full', a file containing the diff3 merge result with
>   conflict markers was left over in .svn/tmp/ directory.
>
> Found by: someone on the #svn IRC channel, some time ago
>
> Modified:
>    subversion/trunk/subversion/libsvn_wc/merge.c
>
> Modified: subversion/trunk/subversion/libsvn_wc/merge.c
> URL: 
> http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_wc/merge.c?rev=1026105&r1=1026104&r2=1026105&view=diff
> ==
> --- subversion/trunk/subversion/libsvn_wc/merge.c (original)
> +++ subversion/trunk/subversion/libsvn_wc/merge.c Thu Oct 21 19:07:54 2010
> @@ -1039,7 +1039,10 @@ merge_text_file(svn_skel_t **work_items,
>         }
>
>       if (*merge_outcome == svn_wc_merge_merged)
> -        return SVN_NO_ERROR;
> +        {
> +          SVN_ERR(svn_io_remove_file2(result_target, TRUE, scratch_pool));
> +          return SVN_NO_ERROR;
> +        }
>     }
>   else if (contains_conflicts && dry_run)
>       *merge_outcome = svn_wc_merge_conflict;
> @@ -1078,6 +1081,8 @@ merge_text_file(svn_skel_t **work_items,
>                                             result_pool, scratch_pool));
>       *work_items = svn_wc__wq_merge(*work_items, work_item, result_pool);
>     }
> +  else
> +    SVN_ERR(svn_io_remove_file2(result_target, TRUE, scratch_pool));
>
>   return SVN_NO_ERROR;
>  }
>
>
>


Determining the 'revert' output we want

2010-10-21 Thread Erik Huelsmann
Last week, I greatly simplified our 'revert' code. However, in the
process, I changed the notifications from 'revert' too:

The old code would send notifications for all modified nodes
(including tree modifications), with a single exception: it would send
a notification only for the root in case of
non-replaced-added/copied/moved nodes.

The change that I made is to extend 'notify-on-root-only' to added
(non-copied/moved) nodes which are also replacements.


However, talking to Bert, he said he'd rather get more notifications
than fewer and decide himself whether he wants to show them in his
GUI. This made me think we may want to distinguish two or three
notification types for a path:

 * Content/props-restored paths
 * Removed-from-version-control paths (invoked for non-replaced
added/copied/moved paths)
 * Restored paths (invoked for deleted/replaced paths)

Optionally, it would be possible to use different notifications for
paths which are not op_roots, but which *are* part of a tree
modification - let's call those 'derived' paths.


Ofcourse, we'd then need to decide which notifications our client would show.


Thoughts? Comments?


Bye,


Erik.


Re: svn commit: r1022931 - in /subversion/trunk/subversion/libsvn_wc: status.c wc-queries.sql wc_db.c wc_db.h

2010-10-15 Thread Erik Huelsmann
>> - cold cache: 1.7 is almost 50% faster than 1.6
>> 1.7: 22s
>> 1.6: 42s
>>
>> - hot cache: 1.7 is just about on par with 1.6 (only 20% slower)
>> 1.7: 0.86s
>> 1.6: 0.72s
>>
>
> What do you guys mean by "cold cache" and "hot cache"? If they mean what I 
> think they mean, wouldn't "hot cache" be faster that "cold cache" ?

I think they are what you think. 22 seconds is slower than < 1s, isn't it?

Bye,

Erik.


revert behaviour in the light of layered working copy changes

2010-10-12 Thread Erik Huelsmann
As Julian pointed out, I'm working on making 'revert' work with our
NODES table in the layered design situation. As part of that work, I
was studying the current behaviour of revert: supposedly, that's what
the behaviour of the new revert should look like in simple cases.

However, one of the things I found is that revert leaves unversioned
artifacts behind. While I'm aware that in some situations this is part
of the policy (don't delete uncommitted changes), in case of revert,
it's rather unpractical for a number of reasons:

1. The artifacts left behind can cause botched merges later on - even
with our current client
2. The artifacts can lead to obstructions in the new working copy
model when we're going with the model of "incremental reverts" that
Julian proposed

Even if we want to prevent the deletion of uncommitted changes -which
I'm going to challenge next- I think we leave behind way too many
artifacts: all files and directories which were part of a copy or move
tree-restructuring operation are left behind on revert. Now, The
problem here is that the files are even left behind if they were
unmodified - and hence reproducible - by which reasoning no
destruction of local modifications could have happened in the first
place.


This is why I'm now proposing that we stop to leave behind the
-unchanged- files which are part of a copy or move operation.


One could argue that the same reasoning could be applied to added
trees. However, in that case, you might also apply the reasoning that
the subtree should stay behind unversioned: it's afterall only the
'add' operation which we're reverting and deleting the added subtree
might actually destroy users' efforts.

The tricky bit to the reasoning in the paragraph above is that we
don't check if files have been fully changed (effectively replaced) or
not, meaning that simply reverting a versioned file could in effect
have the same consequences as deleting an added file.


With respect to "keeping around unversioned reverted-adds", I'm not
sure what to propose. What do others think? I'm inclined to argue
along the lines of "they're all delete operations", however, given our
current behaviour, I also see why users wouldn't expect this
behaviour.


Comments?


Bye,


Erik.


Re: Format 20 upgrade to NODES

2010-10-06 Thread Erik Huelsmann
On Wed, Oct 6, 2010 at 1:12 PM, Julian Foad  wrote:
> On Wed, 2010-10-06 at 09:32 +0100, Philip Martin wrote:
>> I'd like to enable NODES as a replacement for BASE_NODE and
>> WORKING_NODE.  This would involve bumping the format number, and old
>> working copies would get automatically upgraded.
>
> +1 from me, ASAP.
>
> We're still working on the op_depth support and it's more complex than I
> originally thought.  It looks like doing this transition in two separate
> format bumps will be more expedient.
>
> Please give me 24h to change the order of NODES columns first - see
> separate email.

+1 from me too.

Bye,

Erik.


Re: svn commit: r999837 - /subversion/trunk/subversion/libsvn_wc/wc-queries.sql

2010-09-23 Thread Erik Huelsmann
On Wed, Sep 22, 2010 at 11:25 PM, Greg Stein  wrote:

> On Wed, Sep 22, 2010 at 05:39,   wrote:
> >...
> > +++ subversion/trunk/subversion/libsvn_wc/wc-queries.sql Wed Sep 22
> 09:39:45 2010
> > @@ -215,7 +215,7 @@ update nodes set properties = ?3
> >  where wc_id = ?1 and local_relpath = ?2
> >   and op_depth in
> >(select op_depth from nodes
> > -where wc_id = ?1 and local_relpath = ?2
> > +where wc_id = ?1 and local_relpath = ?2 and op_depth > 0
> > order by op_depth desc
> > limit 1);
>
> Wouldn't it be better to do:
>
> where wc_id = ?1 and local_relpath = ?2
>  and op_depth = (select max(op_depth) from nodes
>where wc_id=?1 and local_relpath=?2 and op_depth > 0);
>
> It seems that eliminating the "order by" and "limit", in favor of
> max() will tell sqlite what we're really searching for: the maximal
> value.
>
>
I wrote those queries like that because Bert said it would introduce an
aggregation function - at the time he said it, that sounded like it was
something negative.


> Also note that the above query uses "op_depth in (...)"
>
> yet:
>
> >
> > @@ -312,7 +312,7 @@ WHERE wc_id = ?1 AND local_relpath = ?2;
> >  update nodes set translated_size = ?3, last_mod_time = ?4
> >  where wc_id = ?1 and local_relpath = ?2
> >   and op_depth = (select op_depth from nodes
> > -  where wc_id = ?1 and local_relpath = ?2
> > +  where wc_id = ?1 and local_relpath = ?2 and op_depth >
> 0
> >   order by op_depth desc
> >   limit 1);
>
> This one does not. The rest of the statements you converted all use
> the "in" variant.
>

 The "in" variant is probably better, because - especially with the op_depth
> 0 restriction - the result set can probably be empty.


Bye,

Erik.


Re: UTF-8 NFC/NFD paths issue

2010-09-20 Thread Erik Huelsmann
Sorry to have left the discussion running so long without contributing to it
myself. The reason I started about changing the repository / fs is because
it is where we store the dataset that we'll need to support forever: working
copies get destroyed and checked out over and over every hour, every day.
Repositories get created once and only accumulate data.


> > That doesn't solve the historical revisions containing "bad" paths. My
> > understanding of the problem was that we'd go into the past and
> > rewrite the paths into a single, canonical form.
> >
>
> Agreed: an out-of-band solution fixes thing historically too.
>

As pointed out on IRC, I think it's important to stop adding semantically
the same paths to a repository. From the perspective of efficiency, it might
be handy to have a normalized version stored somewhere for all paths living
in the repository, but to prevent addition of differently encoded paths,
such a thing isn't really required: the correct encoding can be calculated
when the check happens.


> Having backend enforce NFC can wait for 2.0 I suppose :)
>

True, but the value of that might be limited: if we required all
communications to be NFC encoded, we need to take additional measures - as
pointed out by Branko - to make things work on MacOS X: currently, we have
MacOS X shops happily working with non-ascii characters in the paths, all
NFD encoded. That would change.

By the way, Julian Foad, Philip Martin, Bert Huijben and I talked through a
possible solution to fix the client-side issue which becomes an option once
we switch to wc-ng. The full impact of that change needs to be determined
though and probably does not fit in the 1.7 timeline. If it seems it does,
we'll bring it up.



To recap, the change I'm proposing is that we check pathnames with NFC/D
aware comparison routines upon add_file() / add_directory() inside
libsvn_repos or libsvn_fs_* - of which I suspect it's easier to handle
inside the latter. In my proposal, we don't specify a "repository normal"
encoding. If performance degrades too much, we can enhance the filesystem
with a normalized version which doesn't need to be recoded in order to do
the comparison with the incoming path.

Other than that, I don't think there's anything *required* to make us
Unicode-aware on the server. It's also the change I'm proposing cmpilato to
implement in libsvn_fs_base as a proof of concept.


This proposal says nothing about the client side. The client side can be
fixed independently from the server side, given that we can't switch to
normalized paths in the protocol until 2.0: whatever paths a server sends,
the client will need to use those to communicate back to the server.


Bye,

Erik.


Re: svn commit: r997905 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c

2010-09-16 Thread Erik Huelsmann
Hi Greg,

On Thu, Sep 16, 2010 at 10:47 PM, Erik Huelsmann  wrote:

>
>
> On Thu, Sep 16, 2010 at 10:40 PM, Philip Martin <
> philip.mar...@wandisco.com> wrote:
>
>> Erik Huelsmann  writes:
>>
>> > We're now back to a single failure. It's in the relocation-verification
>> code
>> > in db-test.c (line 1505). With the half-hour I've spent so far, I wasn't
>> > able to locate it, but I have to move to other business now. Hopefully
>> > you'll be able to find it.
>>
>> It's the difference between the old STMT_UPDATE_BASE_RECURSIVE_REPO
>> and the new STMT_RECURSIVE_UPDATE_NODE_REPO. The first updates
>> non-null repo_ids while the second updates repo_ids that match the old
>> repo_id.  This makes a difference when a node has a non-null repo_id
>> that doesn't match the the old repo_id.
>>
>> I'm not sure whether the pre-relocate db is valid, and if it is I'm
>> not sure which of the relocate algorithms is correct.
>>
>
> The latter query (the one which verifies the repo_id) is the one I wrote. I
> did so intentionally: from the description of the copyfrom_* fields in the
> WORKING_NODE table, I couldn't but conclude they may be referring to a
> different repository. Since the new query is updating both BASE and WORKING,
> I thought verification of the old repo_id to be required. Additionally, what
> happens if -for whatever reason- 1 working copy contains references to
> multiple repositories? The former query will rewrite everything to be part
> of the same repository. Hence, I think the former query is flawed.
>
> I hope the original author (Greg?) has something to say about it.
>

Just checked who originally wrote the
STMT_UPDATE_RECURSIVE_BASE/WORKING_REPO query for use with relocate. Turns
out to be you indeed.

The fact that the old copyfrom_* fields have a repo_id column indicates to
me that you're provisioning the option to store the fact that a file has
been copied off a different repository. The fact that you store (in BASE) a
repo_id for every base node (instead of one per wc) probably means that
you're provisioning to have multiple repository sources for a single wc.

However, the UPDATE_RECURSIVE_BASE_REPO query doesn't take any of that into
account and simply rewrites all repo_ids to be the new repo to relocate to.
That doesn't seem correct though: if other nodes had different repository
sources, those should probably be excluded from relocation, no?

What's your view on this?

Bye,


Erik.


Re: svn commit: r997905 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c

2010-09-16 Thread Erik Huelsmann
On Thu, Sep 16, 2010 at 10:40 PM, Philip Martin
wrote:

> Erik Huelsmann  writes:
>
> > We're now back to a single failure. It's in the relocation-verification
> code
> > in db-test.c (line 1505). With the half-hour I've spent so far, I wasn't
> > able to locate it, but I have to move to other business now. Hopefully
> > you'll be able to find it.
>
> It's the difference between the old STMT_UPDATE_BASE_RECURSIVE_REPO
> and the new STMT_RECURSIVE_UPDATE_NODE_REPO. The first updates
> non-null repo_ids while the second updates repo_ids that match the old
> repo_id.  This makes a difference when a node has a non-null repo_id
> that doesn't match the the old repo_id.
>
> I'm not sure whether the pre-relocate db is valid, and if it is I'm
> not sure which of the relocate algorithms is correct.
>

The latter query (the one which verifies the repo_id) is the one I wrote. I
did so intentionally: from the description of the copyfrom_* fields in the
WORKING_NODE table, I couldn't but conclude they may be referring to a
different repository. Since the new query is updating both BASE and WORKING,
I thought verification of the old repo_id to be required. Additionally, what
happens if -for whatever reason- 1 working copy contains references to
multiple repositories? The former query will rewrite everything to be part
of the same repository. Hence, I think the former query is flawed.

I hope the original author (Greg?) has something to say about it.

Bye,

Erik.


Re: svn commit: r997905 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c

2010-09-16 Thread Erik Huelsmann
Hi Philip,

On Thu, Sep 16, 2010 at 10:07 PM,  wrote:

> Author: ehu
> Date: Thu Sep 16 20:07:27 2010
> New Revision: 997905
>
> URL: http://svn.apache.org/viewvc?rev=997905&view=rev
> Log:
> Fix one of two remaining SVN_WC__NODES failures (manifesting itself twice).
>
>  * subversion/tests/libsvn_wc/entries-compat.c
>   (TESTING_DATA): Add NODES (working_node) data.
>
>
We're now back to a single failure. It's in the relocation-verification code
in db-test.c (line 1505). With the half-hour I've spent so far, I wasn't
able to locate it, but I have to move to other business now. Hopefully
you'll be able to find it.

Bye,

Erik.


UTF-8 NFC/NFD paths issue

2010-09-15 Thread Erik Huelsmann
Yesterday, I was talking to CMike about our long-standing issue with UTF-8
strings designating a certain path not neccessarily being equal to other
strings designating the same path. The issue has to do with NFC (composed)
and NFD (decomposed) representation of Unicode characters. CMike nicely
called the issue the "Erik Huelsmann issue" yesterday :-)

The issue consists of two parts:
 1. The repository which should determine that paths being added by a commit
are unique, regardless of their encoding (NFC/NFD)
 2. The client which should detect that the pathnames coming in from the
filesystem may differ in encoding from what's in the working copy
administrative files [this is mainly an issue on the Mac:
http://subversion.tigris.org/issues/show_bug.cgi?id=2464]

Mike, the thing I have been trying to find around our filesystem
implementation is where an editor drive adding a path [add_directory() or
add_file()] checks whether the file already exists. The check at that point
should be encoding independent, for example by making all paths NFC (or NFD)
before comparison. You could use utf8proc (
http://www.flexiguided.de/publications.utf8proc.en.html) to do the
normalization - it's very light-weight in contrast to ICU which provides the
same fuctionality, but has a much broader scope.

The problem I was telling you about is that I was looking in libsvn_fs_base
to find where the existence check is performed, but I couldn't find it.

Basically what I was trying to do is: do what we do now (ie fail if the path
exists and succeed if it doesn't), with the only difference that the paths
used for comparison are guarenteed to be the same normalization - meaning
they are the same byte sequence when they're equal unicode.


Bye,


Erik.


Fwd: svn commit: r996661 - /subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c

2010-09-13 Thread Erik Huelsmann
Julian,

This commit should remove the test failures you were experiencing on trunk
with SVN_WC__NODES. At least that should give you confidence that if you see
failures, you probably introduced them with local changes :-)


Bye,


Erik.

-- Forwarded message --
From: 
Date: Mon, Sep 13, 2010 at 9:41 PM
Subject: svn commit: r996661 -
/subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c
To: comm...@subversion.apache.org


Author: ehu
Date: Mon Sep 13 19:41:00 2010
New Revision: 996661

URL: http://svn.apache.org/viewvc?rev=996661&view=rev
Log:
 * subversion/tests/libsvn_wc/entries-compat.c (TESTING_DATA): Add NODES
data.

Modified:
   subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c

Modified: subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c
URL:
http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c?rev=996661&r1=996660&r2=996661&view=diff
==
--- subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c (original)
+++ subversion/trunk/subversion/tests/libsvn_wc/entries-compat.c Mon Sep 13
19:41:00 2010
@@ -90,6 +90,7 @@ static const char * const TESTING_DATA =
   /* ### The file_externals column in BASE_NODE is temporary, and will be
  ### removed.  However, to keep the tests passing, we need to add it
  ### to the following insert statements.  *Be sure to remove it*. */
+#ifndef SVN_WC__NODES_ONLY
   "insert into base_node values ("
   "  1, '', 1, '', null, 'normal', 'dir', "
   "  1, null, null, "
@@ -187,6 +188,92 @@ static const char * const TESTING_DATA =
   "  1, " TIME_1s ", '" AUTHOR_1 "', null, null, null, '()', null, null, "
   "  null); "
   " "
+#endif
+#ifdef SVN_WC__NODES


NODE_DATA / NODES status

2010-09-10 Thread Erik Huelsmann
Today, I finished replacing all NODE_DATA queries (UPDATE/DELETE/INSERT) in
wc_db.c by queries which operate on NODES. From here on, I'll start to write
code to query BASE_NODE+NODES and WORKING_NODE+NODES, verifying that both
tables return the same results.

There are however, a few queries in 'entries.c' which operate directly on
BASE_/WORKING_NODE. These queries will need to be migrated. However, in our
old entries, we don't have the concept of op_depths and op roots. That makes
it a bit hard to migrate the entries file to the exact semantics of the
NODES table. When we fix the WORKING_NODE concept to have an op_depth == 1
during migration, however, conversion of the queries in that file isn't much
of a problem.

Does anybody expect serious issues from all working nodes having the same
op_depth?


The alternative would be to set the op_depth of each working node to the
path component count of its local_relpath (making each node a stand-alone
change).


Now that I write the above, I think it's the sanest to make each working
node its own oproot. That would be roughly as simple to code as the
"everything is 1" assumption.


Better ideas? Comments?


Bye,


Erik.


Migrating from NODE_DATA/BASE_NODE/WORKING_NODE to NODES

2010-09-06 Thread Erik Huelsmann
In r992993 the NODES table design was added. The SVN_WC__NODES conditional
was created to enable known-working code for this schema. The NODES
conditional will be used to flag sections which need to be further looked
into for modification, just as with SINGLE_DB and NODE_DATA.

It would be my idea to work toward a situation where - under SVN_WC__NODES -
everything is written to two tables, verifying their equality when reading
the data back from them. Then when we're able to run in that mode, we can
switch to the NODES table from the current model of three tables.

I'm going to tear down SVN_WC__NODE_DATA in the process. There are no
guarantees the code will remain in working state with that conditional.


Bye,

Erik.


Re: svn commit: r992886 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c

2010-09-06 Thread Erik Huelsmann
On Mon, Sep 6, 2010 at 11:35 AM, Bert Huijben  wrote:

>
> > +  SVN_ERR(svn_sqlite__get_statement(&stmt, pdh->wcroot->sdb,
> > +STMT_INSERT_NODE_DATA));
> > +
> > +  SVN_ERR(svn_sqlite__bindf(stmt, "isi", pdh->wcroot->wc_id, base,
> > +(apr_int64_t) 0 /* BASE */
> > +));
> > +  SVN_ERR(svn_sqlite__bind_text(stmt, 4, ""));
> > +  SVN_ERR(svn_sqlite__bind_token(stmt, 5, presence_map,
> > + svn_wc__db_status_normal));
> > +  SVN_ERR(svn_sqlite__bind_token(stmt, 6, kind_map,
> > + svn_wc__db_kind_subdir));
>
> Why don't you use _bindf("isistt", ...) here?
> That would include all the other fields. (Other option: separate binds of
> all values)
>
>
Right. We have many situations where we could/should bind all values through
..._bindf(). This was merely duplicating what was exactly above it. I prefer
the ..._bindf() version myself, but didn't want to rewrite existing code to
use it. Based on your feedback, I think I just might do that anyway, when in
the same function.


Bye,

Erik.


Re: [PROPOSAL] WC-NG: merge NODE_DATA, WORKING_NODE and BASE_NODE into a single table (NODES)

2010-09-06 Thread Erik Huelsmann
Given all the responses in the thread, I'd say we're moving to the single
table for BASE and WORKING node recording. There was a flurry of activity
from me yesterday and this morning regarding NODE_DATA: that was just me
flushing my queue of patches.

The work isn't completely irrelevant, as it identifies the spots where the
NODES table will be introduced, just like NODE_DATA had to.

Today, I'll go to draw up the NODES table and move over the queries which
had already been modified for NODE_DATA over to the NODES design. I hope to
get a very long way today already. If it's not done today, then I expect it
to be able to finish it this week.

Anyone wanting to join in: let's chat on IRC.


Bye,


Erik.

On Thu, Sep 2, 2010 at 11:34 PM, Erik Huelsmann  wrote:

>
>
> As described by Julian earlier this month, Julian, Philip and I observed
> that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in
> common. Notably, by introducing the NODE_DATA table, most fields from
> BASE_NODE and WORKING_NODE already moved to a common table.
>
> The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the
> side of WORKING_NODE are the 2 cache fields 'translated_size' and
> 'last_mod_time'. Apart from those two, there are the indexing fields wc_id,
> local_relpath and parent_relpath.
>
> In the end we're storing *lots* of bytes (wc_id, local_relpath and
> parent_relpath) to store 2 64-bit values.
>
> On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path
> and revision. The NODE_DATA table already has the fields original_repos_id,
> original_repos_path and original_revision. When op_depth == 0, these are
> guaranteed to be empty (null), since they are for working nodes with
> copy/move source information. Renaming the three fields in NODE_DATA to
> repos_id, repos_path and revision, generalizing their use to include
> op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE
> would be reduced to a store of the dav_cache, translated_size and
> last_mod_time fields.
>
> By subsuming translated_size and last_mod_time into NODE_DATA, neither
> WORKING_NODE nor BASE_NODE will need to store these values anymore. This
> eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then
> only stores dav_cache. Here too, it's probably more efficient (in size) to
> store dav_cache in NODE_DATA to prevent repeated storage of wc_id,
> local_relpath and parent_relpath in BASE_NODE.
>
> In addition to the eliminated storage overhead, we'd be making things a
> little less complex for ourselves: UPDATE, INSERT and DELETE queries would
> be operating only on a single table, removing the need to split updates
> across multiple statements.
>
>
> This week, I was discussing this change with Greg on IRC. We both have the
> feeling this should work out well. The proposal here is to switch
> (WORKING_NODE, NODE_DATA, BASE_NODE) into a single table -->  NODES.
>
>
> Comments? Fears? Enhancements?
>
>
> Bye,
>
>
> Erik.
>


[PROPOSAL] WC-NG: merge NODE_DATA, WORKING_NODE and BASE_NODE into a single table (NODES)

2010-09-02 Thread Erik Huelsmann
As described by Julian earlier this month, Julian, Philip and I observed
that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in
common. Notably, by introducing the NODE_DATA table, most fields from
BASE_NODE and WORKING_NODE already moved to a common table.

The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the
side of WORKING_NODE are the 2 cache fields 'translated_size' and
'last_mod_time'. Apart from those two, there are the indexing fields wc_id,
local_relpath and parent_relpath.

In the end we're storing *lots* of bytes (wc_id, local_relpath and
parent_relpath) to store 2 64-bit values.

On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path
and revision. The NODE_DATA table already has the fields original_repos_id,
original_repos_path and original_revision. When op_depth == 0, these are
guaranteed to be empty (null), since they are for working nodes with
copy/move source information. Renaming the three fields in NODE_DATA to
repos_id, repos_path and revision, generalizing their use to include
op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE
would be reduced to a store of the dav_cache, translated_size and
last_mod_time fields.

By subsuming translated_size and last_mod_time into NODE_DATA, neither
WORKING_NODE nor BASE_NODE will need to store these values anymore. This
eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then
only stores dav_cache. Here too, it's probably more efficient (in size) to
store dav_cache in NODE_DATA to prevent repeated storage of wc_id,
local_relpath and parent_relpath in BASE_NODE.

In addition to the eliminated storage overhead, we'd be making things a
little less complex for ourselves: UPDATE, INSERT and DELETE queries would
be operating only on a single table, removing the need to split updates
across multiple statements.


This week, I was discussing this change with Greg on IRC. We both have the
feeling this should work out well. The proposal here is to switch
(WORKING_NODE, NODE_DATA, BASE_NODE) into a single table -->  NODES.


Comments? Fears? Enhancements?


Bye,


Erik.


Re: svn commit: r986332 - in /subversion/trunk/subversion/libsvn_wc: wc-queries.sql wc_db.c

2010-08-17 Thread Erik Huelsmann
>>
>> Modified:
>>     subversion/trunk/subversion/libsvn_wc/wc-queries.sql
>>     subversion/trunk/subversion/libsvn_wc/wc_db.c
>
> Your log message doesn't describe any changes in wc_db.c

Prop-edited now. Thanks.

Bye,

Erik.


NODE_DATA (2nd iteration)

2010-07-12 Thread Erik Huelsmann
After lots of discussion regarding the way NODE_DATA/4th tree should
be working, I'm now ready to post a summary of the progress. In my
last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I
stated why we need this; this post is about the conclusion of what
needs to happen. Also included are the first steps there.


With the advent of NODE_DATA, we distinguish node values specifically
related to BASE nodes, those specifically related to "current" WORKING
nodes and those which are to be maintained for multiple levels of
WORKING nodes (not only the "current" view) (the latter category is
most often also shared with BASE).

The respective tables will hold the columns shown below.


-
TABLE WORKING_NODE (
  wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
  local_relpath  TEXT NOT NULL,
  parent_relpath  TEXT,
  moved_here  INTEGER,
  moved_to  TEXT,
  original_repos_id  INTEGER REFERENCES REPOSITORY (id),
  original_repos_path  TEXT,
  original_revnum  INTEGER,
  translated_size  INTEGER,
  last_mod_time  INTEGER,  /* an APR date/time (usec since 1970) */
  keep_local  INTEGER,

  PRIMARY KEY (wc_id, local_relpath)
  );

CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath);


The moved_* and original_* columns are typical examples of "WORKING
fields only maintained for the visible WORKING nodes": the original_*
and moved_* fields are inherited from the operation root by all
children part of the operation. The operation root will be the visible
change on its own level, meaning it'll have rows both in the
WORKING_NODE and NODE_DATA tables. The fact that these columns are not
in the WORKING_NODE table means that tree changes are not preserved
accros overlapping changes. This is fully compatible with what we do
today: changes to higher levels destroy changes to lower levels.

The translated_size and last_mod_time columns exist in WORKING_NODE
and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that
they exist in BASE_NODE is a bit of a hack: it's to prevent creation
of WORKING_NODE data for every file which has keyword expansion or eol
translation properties set: these columns serve only to optimize
working copy scanning for changes and as such only relate to the
visible WORKING_NODEs.


 TABLE BASE_NODE (
  wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
  local_relpath  TEXT NOT NULL,
  repos_id  INTEGER REFERENCES REPOSITORY (id),
  repos_relpath  TEXT,
  parent_relpath  TEXT,
  translated_size  INTEGER,
  last_mod_time  INTEGER,  /* an APR date/time (usec since 1970) */
  dav_cache  BLOB,
  incomplete_children  INTEGER,
  file_external  TEXT,

  PRIMARY KEY (wc_id, local_relpath)
  );


TABLE NODE_DATA (
  wc_id  INTEGER NOT NULL REFERENCES WCROOT (id),
  local_relpath  TEXT NOT NULL,
  op_depth  INTEGER NOT NULL,
  presence  TEXT NOT NULL,
  kind  TEXT NOT NULL,
  checksum  TEXT,
  changed_rev  INTEGER,
  changed_date  INTEGER,  /* an APR date/time (usec since 1970) */
  changed_author  TEXT,
  depth  TEXT,
  symlink_target  TEXT,
  properties  BLOB,

  PRIMARY KEY (wc_id, local_relpath, oproot)
  );

CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath);


Which leaves the NODE_DATA structure above. The op_depth column
contains the depth of the node - relative to the wc root - on which
the operation was run which caused the creation of the given NODE_DATA
node.  In the final scheme (based on single-db), the value will be 0
for base and a positive integer for WORKING related data.

In order to be able to implement NODE_DATA even without having a fully
functional SINGLE_DB yet, a transitional node numbering scheme needs
to be devised. The following numbers will apply: BASE == 0,
WORKING-this-dir == 1, WORKING-any-immediate-child == 2.


Other transitioning related remarks:

 * Conditional-protected experimentational sections, just like with SINGLE_DB
 * Initial implementation will simply replace the current
functionality of the 2 tables, from there we can work our way through
whatever needs doing.
 * Am I forgetting any others?

Bye,

Erik.


Re: NODE_DATA (aka fourth tree)

2010-07-12 Thread Erik Huelsmann
>>  * moved_here
>>  * moved_to

On IRC, we were discussing the fact that these columns are in the
databases, but nobody seems to be planning to implement them for 1.7.
Is that your perception too? If so, we could remove them with the
upcoming schema-change required for NODE_DATA.

Bye,

Erik.


Re: NODE_DATA (aka fourth tree)

2010-07-12 Thread Erik Huelsmann
On Sun, Jul 11, 2010 at 1:04 AM, Greg Stein  wrote:
> On Sat, Jul 10, 2010 at 17:55, Erik Huelsmann  wrote:
>>...
>> Columns to be placed in NODE_DATA:
>>
>>  * wc_id
>>  * local_relpath
>>  * oproot_distance
>>  * presence
>>  * kind
>>  * revnum
>
> revnum is a BASE concept, so it does not belong here. WORKING nodes do
> not have a revision until they are committed. If the node is copied
> from the repository, then the *source* of that copy needs a revision
> and path, but that is conceptually different from "revnum" (which
> identifies the rev of the node itself).
>
>>  * checksum
>>  * translated_size
>>  * last_mod_time

Thinking about it a bit more, I think translated_size and
last_mod_time are a bit odd to have in NODE_DATA - although they are
part of both BASE_NODE and WORKING_NODE: they really do apply only to
BASE and the *current* working node: they are part of the optimization
to determine if a file has changed. Presumably, when a different layer
of WORKING becomes visible, we'll be recalculating both fields.


If that's the case, shouldn't we just hold onto them in their respective tables?


>>  * changed_rev
>>  * changed_date
>>  * changed_author
>>  * depth
>>  * properties
>>  * dav_cache
>
> dav_cache is also a BASE concept, and remains in BASE_NODE.

Agreed.

>>  * symlink_target
>>  * file_external
>
> I'm not sure that file_external belongs here. We certainly don't have
> it in WORKING_NODE.

I've been informing around on IRC to understand the difference between
why that would apply to file_external, but not to symlink_target. The
difference isn't clear to me yet. Do you have anything which might
help me?

>> This means, these columns stay in WORKING_NODE (next to its key, ofcourse):
>>
>>  * copyfrom_repos_id
>>  * copyfrom_repos_path
>>  * copyfrom_revnum
>>  * moved_here
>>  * moved_to
>>
>> These columns can stay in WORKING_NODE, because all children inherit
>> their values from the oproot. I.e. a subdirectory of a copied
>> directory inherits the copy/move info, unless it's been copied/moved
>> itself, in which case it has its own copy information.
>
> Right.
>
> Also note that we can opportunistically rename the above columns to
> their wc_db API names: original_*. They would be original_repos_id,
> original_repos_relpath, original_revision.

Done. (In my local patch-in-preparation.)


Bye,


Erik.


NODE_DATA (aka fourth tree)

2010-07-10 Thread Erik Huelsmann
As announced by gstein before, we've had some discussion on the
NODE_DATA structure which should allow storing multiple levels of tree
manipulation in our wc-db. This mail aims at describing my progress on
the subject so far. Please review and comment.


Introduction


What's the 4th tree about? The 4th tree is not 1 tree, but instead
it's the ability to store overlapping tree changes in our WORKING
tree. Take the following tree:

root
 +- A - C - file
 \- B - C - file

Then, imagine replacing A with B. All would be fine with our current
single level WORKING representation. However, if we replace 'file' in
the copied tree, a single level won't do anymore: if you revert the
replacement of file, you want to revert to what was there when the
tree was copied. The other option - which you don't want because it
would result in an inconsistent tree - would be that wc-ng would
revert to what was there even before the copy operation.

Being able to revert the 'file' replacement independently of the 'A'
replacement, you need 2 levels of WORKING nodes for 'file': one for
the direct replacement and one for the replacement that comes with
replacing 'A'. Using the same logic, many levels may be required to
model complicated working copy changes.


What this change is not
--

This change does not include any change to the current behaviour of
libsvn_wc that modifying modified trees are destructive operations.
The multi-level model exists only to keep track of WORKING tree
changes, not to make changes to the ACTUAL tree visible again after
reverting a replaced subtree.



Proposed change
-

Greg made a proposal on the list some time ago which allows the
required multiplicity of WORKING nodes by creating a new table:
NODE_DATA. The table was proposed to hold a subset of the columns
currently in the BASE_NODE and WORKING_NODE tables.

The rationale about storing the BASE_NODE data in the table too is
that a query for a node which doesn't have a WORKING version will
simply return the BASE version. That way, there's no need to teach the
code about the absense of WORKING. Although the BASE_NODE information
is put in this table, this doesn't mean the BASE_NODE and WORKING_NODE
concepts are being redefined, other than allowing layered WORKING_NODE
(sub)trees.


Columns to be placed in NODE_DATA:

 * wc_id
 * local_relpath
 * oproot_distance
 * presence
 * kind
 * revnum
 * checksum
 * translated_size
 * last_mod_time
 * changed_rev
 * changed_date
 * changed_author
 * depth
 * properties
 * dav_cache
 * symlink_target
 * file_external

This means, these columns stay in WORKING_NODE (next to its key, ofcourse):

 * copyfrom_repos_id
 * copyfrom_repos_path
 * copyfrom_revnum
 * moved_here
 * moved_to

These columns can stay in WORKING_NODE, because all children inherit
their values from the oproot. I.e. a subdirectory of a copied
directory inherits the copy/move info, unless it's been copied/moved
itself, in which case it has its own copy information.


As described before, sorting the nodes relating to a certain path in
ascending order relating to their oproot, you'd always get the
'current' WORKING state applicable to the node, if the distance
between the node and the working copy root is used to identify the
BASE_NODE data.


Most -if not all- of the changes to the underlying table structure
should stay hidden behind the wc-db API.



Relevance to 1.7
--

Why do we need this change now? Why can't it wait until we finished
1.7, after all, it's just polishing the way we versioned directories
in wc-1, right?

Not exactly. Currently, mixed-revision working copies are modelled
using an oproot for each subtree with its own revision number. That
means that without this change, effectively we can't represent
mixed-revision working copy trees. So, in order to achieve feature
parity with 1.6, we need to realise this change before 1.7.



Well, that's basically it. Comments?


Bye,


Erik.


Re: Antwort: Re: ... Re: dangerous implementation of rep-sharing cache for fsfs

2010-06-30 Thread Erik Huelsmann
On Wed, Jun 30, 2010 at 10:13 PM, Daniel Shahaf  wrote:
> [ trim CC ]
>
> Mark Mielke wrote on Wed, 30 Jun 2010 at 21:37 -:
>> On 06/30/2010 05:57 AM, michael.fe...@evonik.com wrote:
>> > P.S. Thanks for the warning; we are not going to use 1.7.
>
> Did you check what is the probability of dying in a car accident?

Well, I quickly checked their website; they're in the pharma business:
the business of determining the chances of dying of a pill when you
consume it. That definitely explains the paranoia: they're storing law
suit evidence in Subversion before it's actually evidence. (Hence the
paranoia about the data staying *exactly* what they put in.)

>> >       At the Moment we are not using 1.6 either,
>> >       because of the SHA-1 rep-share cache.
>
> In 1.6, representation sharing can be DISABLED.


Bye,


Erik.


Re: misaligned blame output if repo has >1m revisions

2010-04-12 Thread Erik Huelsmann
Hi Philip,

On Mon, Apr 12, 2010 at 3:14 PM, Philipp Marek
 wrote:
> Hello Bert!
>
> On Montag, 12. April 2010, Bert Huijben wrote:
>> Well, on Windows consoles are all 80 characters wide. (You can fix this if
>> you are a frequent Command Prompt User, but most applications just assume
>>  80 characters on Windows. And in many cases Windows switches back to 80
>>  characters if it detects direct screen operations)
>>
>> The tab width is not globally configurable on Windows.
> Do you regularly look at the output in a console? Without piping into a file
> and looking at that with an editor?

I'm not Bert, but: Actually, I haven't ever looked at blame output any
other way.


Bye,


Erik.


Re: misaligned blame output if repo has >1m revisions

2010-04-12 Thread Erik Huelsmann
hi Phil,

On Mon, Apr 12, 2010 at 7:54 AM, Philipp Marek
 wrote:
> Hello Johan,
> hello Stefan,
>
> On Freitag, 9. April 2010, Stefan Sperling wrote:
>> On Fri, Apr 09, 2010 at 10:17:12PM +0200, Johan Corveleyn wrote:
>> > So I guess this is coming up for you guys when s.a.o reaches the 1
>> > million mark :-).
>>
>> Nice buglet. I suppose we could simply add 2 or 3 spaces of indentation
>> until people run into even higher revisions in real life? :)
> what do you mean, "in real life"?
>
>        http://websvn.kde.org/trunk/
>        Directory revision:  1113881
>
> Seems that life is fast enough ;-)

With 3 additional spaces, that would fit, wouldn't it?


Bye,

Erik.


Re: svn client protocol (svn:// uri) specification + any client implementation , if any

2010-03-15 Thread Erik Huelsmann
Dear Karthik,

How about libsvn_ra_svn? (Which is an implementation of libsvn_ra.)

http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/

libsvn_ra is available here:

http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra/

And the headers with documentation - and for linking - can be found here:

http://svn.apache.org/repos/asf/subversion/trunk/subversion/include/

Regards,


Erik.


On Mon, Mar 15, 2010 at 7:30 AM, Karthik K  wrote:
> On 03/14/2010 03:11 PM, Philip Martin wrote:
>>
>> Karthik K  writes:
>>
>>
>>>
>>>     Wondering if there is a document explaining the svn:// uri
>>> connection protocol ( among other transports/protocols that svn
>>> supports, primarily interested in the read-only checkouts), and
>>> clients that implement the protocol.
>>>
>>
>> How about:
>> C
>>
>> http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol
>>
>>
>
> Thanks Philip for the links to the protocol. Curious , if there is any
> apache licensed svn client library (java, say) for the protocol mentioned
> here ?  Thanks.
>