Re: svn commit: r1908812 - /subversion/branches/pristine-checksum-salt/BRANCH-README

2024-01-12 Thread Daniel Shahaf
[Replying to the most recent commit to BRANCH-README:]

kot...@apache.org wrote on Thu, 30 Mar 2023 19:04 +00:00:
> Author: kotkov
> Date: Thu Mar 30 19:04:13 2023
> New Revision: 1908812
>
> URL: http://svn.apache.org/viewvc?rev=1908812=rev
> Log:
> On the 'pristine-checksum-salt' branch: Update BRANCH-README.
>
> * BRANCH-README
>   (Dynamically salted SHA-1 checksums): Extend and update this section.
>
> Modified:
> subversion/branches/pristine-checksum-salt/BRANCH-README
>
> Modified: subversion/branches/pristine-checksum-salt/BRANCH-README
> URL: 
> http://svn.apache.org/viewvc/subversion/branches/pristine-checksum-salt/BRANCH-README?rev=1908812=1908811=1908812=diff
> ==
> --- subversion/branches/pristine-checksum-salt/BRANCH-README (original)
> +++ subversion/branches/pristine-checksum-salt/BRANCH-README Thu Mar 30 
> 19:04:13 2023
> @@ -13,5 +13,26 @@ as currently implemented, will use the n
>  Dynamically salted SHA-1 checksums
>  --
> 
> -The implementation on the branch uses a dynamically salted SHA-1 checksum 
> kind.
> -The dynamic salt is generated during the creation of a wc.db.
> +The working copy currently relies on an assumption that files with identical
> +checksum values have identical content.  For SHA-1, there are publicly known
> +checksum collisions [https://shattered.io] and the situation may become worse
> +with the feasibility of chosen-prefix attacks [https://sha-mbles.github.io].
> +
> +To solve the potential problems and to improve the current state around 
> checksum
> +collisions, the implementation on the branch starts using a dynamically 
> salted
> +SHA-1 checksum kind.
> +
> +The 32-byte random salt is generated during the creation of a wc.db.  When 
> the
> +file content is checksummed, the checksum value is calculated as if the salt 
> was
> +prepended to the content.  In other words, checksum = SHA1(content) becomes
> +checksum = SHA1(salt + content).
> +
> +With the dynamic salt:
> +
> +- Publicly known SHA-1 collisions no longer result in collisions when 
> checksummed
> +  by the working copy.  This is because the actually checksummed content now
> +  includes the random prefix salt.
> +
> +- Constructing a chosen-prefix SHA-1 collision no longer results in a 
> collision
> +  when checksummed by the working copy.  This is because the constructed 
> collision
> +  cannot account for the random prefix salt, because it's unknown in advance.

For context, this is similar to 
https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3cadacbb6f-e0cb-4e5b-8603-0eda19f93...@app.fastmail.com%3E
 but with the suffixing changed to prefixing.

(Changing suffixing to prefixing makes sense, since SHAttered collisions
have the property that appending an identical suffix to two colliding
files generates two other colliding files.)

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-12 Thread Daniel Shahaf
Karl Fogel wrote on Wed, 03 Jan 2024 22:13 +00:00:
> On 01 Apr 2023, Evgeny Kotkov via dev wrote:
> > Daniel Shahaf  writes:
> > 
> > > What's the question or action item to/for me?  Thanks.
> > 
> > I'm afraid I don't fully understand your question.  As you
> > probably remember, the change is blocked by your veto.  To my
> > knowledge, this veto hasn't been revoked as of now, and I simply
> > mentioned that in my email.  It is entirely your decision
> > whether or not to take any action regarding this matter.
> 
> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. Evgeny would
> like to merge this into trunk -- on the grounds, I believe, that it is
> strictly an improvement over what we have now, and it opens the door to
> further future improvements (each of which would go through the usual
> discussion & consensus process, of course).

So, I looked.

This thread comprises 237 posts spanning 30 months (July 2021 through
today).  On 2023-01-20 I cast a veto.  There was some activity
afterwards, but until the parent post of this one, the thread has been
silent for the better part of a year; and now I'm being asked to
withdraw my veto.

Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
I had the context in our heads, and the cache misses took their toll in
tuits and in wallclock time.  Furthermore, I have less spare time for
dev@ discussions than I did when I cast the veto (= a year ago next
Saturday).  Going forward it might be preferable for threads not to
hibernate.

You didn't link the veto, so I had to go grep for it.  It is,
presumably, this one:

>>>> # Archived-At: 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C904aded6-5ef0-4123-ade0-e23a3bb56726%40app.fastmail.com%3E
>>>> Date: Fri, 20 Jan 2023 12:15:24 +
>>>> From: Daniel Shahaf
>>>> To: dev@subversion.apache.org
>>>> Subject: Re: Switching from SHA1 to a checksum type without known 
>>>> collisions in 1.15 working copy format
>>>> Message-Id: <904aded6-5ef0-4123-ade0-e23a3bb56...@app.fastmail.com>
>>>> 
>>>> Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
>>>> > I can complete the work on this branch and bring it to a production-ready
>>>> > state, assuming there are no objections.
>>>> 
>>>> Your assumption is counterfactual:
>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>>>> 
>>>> Objections have been raised, been left unanswered, and now
>>>> implementation work has commenced following the original design.  That's
>>>> not acceptable.  I'm vetoing the change until a non-rubber-stamp design
>>>> discussion has been completed on the public dev@ list.

So, this veto being in front of me, let me reply to the request that
I withdraw it:

> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. Evgeny would
> like to merge this into trunk -- on the grounds, I believe, that it is
> strictly an improvement over what we have now, and it opens the door to
> further future improvements (each of which would go through the usual
> discussion & consensus process, of course).
> 
> Evgeny's work is on this branch...
> 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
> 
> ...which in turn branched from
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
> 
> I used this command to get an overview of the work:
> 
> $ svn cat 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README

As far as I can tell, the request for veto withdrawal is grounded only
in the fact that the veto, whilst in force, prevents the feature branch
from being merged/released.  The request does not allege the veto was
invalid or unfounded in the first place; nor that the veto has /become/
invalid or unfounded due to time having passed; nor that modifications
or alterations to the code [or, in this case, to the decision-making
process] have been made and are believed to have addressed the veto's
grounds.

In summary, the request only deals with the fact of a veto and its
formal/procedural implications, but does not deal with the substantive
justification for the veto at all.

That being the case, I have no reason to believe the original grounds of
the veto have been addressed.

That being the case, I have considered whether merging the feature
branch o

Re: mbox archives

2024-01-12 Thread Daniel Shahaf
Daniel Sahlberg wrote on Mon, 08 Jan 2024 12:21 +00:00:
> Den mån 8 jan. 2024 kl 10:08 skrev Daniel Shahaf :
>
>> How is an interested community member supposed to get this list's archives
>> in mbox format?
>>
>> Those on svn.haxx.se can be obtained from there, but what about the
>> others?  gmane is down, lists.a.o has a download feature that seems to
>> require either downloading one month at a time manually or using browser
>> debug tools, and the other online archives we link to don't have obvious
>> links to download mbox (or Maildir or whatever else) archives.
>>
>> Reconstructing the old mod_mbox links, such as <
>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202312.mbox>,
>> actually works, but how's a user supposed to know to do that?
>>
>> Cheers,
>>
>> Daniel
>> (who's in the data gathering phase for /^k.{5}/g's recent query ;-))
>>
>
> Hi,
>
> Does this work?
> 
> $ curl 
> "https://lists.apache.org/api/mbox.lua?list=dev=subversion.apache.org=2022-12;
>  -o dev_subversion_apache_org_2022-12.mbox
> 

It does, yes. Thanks.

I think it should be clear to members of our community how to access our
Collective Memory (= the list archives) in the preferred form for
reading, to borrow ALv2's definition of "Source".  I would expect that
to be self-evident from ponymail's UI, and failing that, documented on
our /mailing-lists.html page.

> You might get better replies at us...@infra.apache.org or
> us...@ponymail.apache.org.

Thanks for the suggestion, Daniel.

Daniel


mbox archives

2024-01-08 Thread Daniel Shahaf
How is an interested community member supposed to get this list's archives in 
mbox format?

Those on svn.haxx.se can be obtained from there, but what about the others?  
gmane is down, lists.a.o has a download feature that seems to require either 
downloading one month at a time manually or using browser debug tools, and the 
other online archives we link to don't have obvious links to download mbox (or 
Maildir or whatever else) archives.

Reconstructing the old mod_mbox links, such as 
, 
actually works, but how's a user supposed to know to do that?

Cheers,

Daniel
(who's in the data gathering phase for /^k.{5}/g's recent query ;-))


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-04 Thread Daniel Shahaf
Karl Fogel wrote on Wed, 03 Jan 2024 22:13 +00:00:
> On 01 Apr 2023, Evgeny Kotkov via dev wrote:
>>Daniel Shahaf  writes:
>>
>>> What's the question or action item to/for me?  Thanks.
>>
>>I'm afraid I don't fully understand your question.  As you
>>probably remember, the change is blocked by your veto.  To my
>>knowledge, this veto hasn't been revoked as of now, and I simply
>>mentioned that in my email.  It is entirely your decision
>>whether or not to take any action regarding this matter.
>
> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. 
> Evgeny would like to merge this into trunk -- on the grounds, I 
> believe, that it is strictly an improvement over what we have now, 
> and it opens the door to further future improvements (each of 
> which would go through the usual discussion & consensus process, 
> of course).
>

Acknowledging receipt.  I'll reply substantively when I have the time to swap 
in the context.

Daniel

> Evgeny's work is on this branch...
>
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
>
> ...which in turn branched from 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
>
> I used this command to get an overview of the work:
>
> $ svn cat 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README
>
> (The work is several months old now, but for the sake of 
> discussion let's assume it's mergeable, passes all tests, etc. 
> Obviously, Evgeny's only going to merge it when all of those 
> conditions are true -- maybe some minor tweaks will be needed to 
> get it there, I don't know.)
>
> Best regards,
> -Karl


Re: Backport bot not running?

2023-12-18 Thread Daniel Shahaf
Daniel Sahlberg wrote on Mon, 18 Dec 2023 10:44 +00:00:
> Den mån 18 dec. 2023 kl 09:40 skrev Daniel Shahaf :
>> To prevent recurrence, options include (1) make the cron job use the .py
>> implementation; (2) add a regression test to backport_tests.py [sic] and
>> then fix backport.pl's parsing.
>>
>> Glad to see backport.py being used :-)
>>
>
> I'm considering to switch to backport.py after the release of 1.14.5
> (should be discussed on dev@ first of course) for the reasons mentioned in
> the readme: "written in a language that many more active developers are
> comfortable with"). If we could add the missing functions (Reviewing STATUS
> and Adding new entries) or decide that those ore not needed anymore we
> could then remove backport.pl and have one way of doing stuff.

And even if we don't implement F3 and F4 in backport.py, deploying the
already-existing backport.py implementation of F1 in production would
still be a step in the right direction.

It's not clear to me whether F2 runs in production currently or not.

Cheers,

Daniel

[Terms from backport.py:
> F1. Auto-merge bot; the nightly svn-role commits.
> F2. Conflicts detector bot; the svn-backport-conflicts-1.9.x buildbot task.
> F3. Reviewing STATUS nominations and casting votes.
> F4. Adding new entries to STATUS.
]


Re: Backport bot not running?

2023-12-18 Thread Daniel Shahaf
Daniel Sahlberg wrote on Thu, 30 Nov 2023 07:00 +00:00:
> Den ons 29 nov. 2023 kl 17:25 skrev Nathan Hartman >:
>
>> On Wed, Nov 29, 2023 at 8:40 AM Daniel Sahlberg
>>  wrote:
>> >
>> > Den ons 29 nov. 2023 kl 06:55 skrev Daniel Sahlberg <
>> daniel.l.sahlb...@gmail.com>:
>> >>
>> >>
>> >> ons 29 nov. 2023 kl. 05:57 skrev Nathan Hartman <
>> hartman.nat...@gmail.com>:
>> >>>
>> >>> The backport bot (svn-role) normally runs nightly but the most recent
>> >>> backport approval has been waiting in 1.14.x/STATUS for a couple of
>> >>> days now.
>> >>>
>> >>> I went ahead and merged it manually (with
>> >>> tools/dist/merge-approved-backports.py). This did the right thing, so
>> >>> I assume there wasn't any syntax error in STATUS.
>> >>>

That was r1914201, I take it.

>> >>> I don't have access to svn-qavm1 so I can't check why it didn't happen
>> >>> automatically. Maybe someone with access could check if the machine is
>> >>> at least running...
>> >>>
>> >>> Thanks,
>> >>> Nathan
>> >>
>> >>
>> >> I’ll check later today
>> >>
>> >
>> > Now I've spent some time looking.
>> >
>> > The backports is a cron job running at 04.00 UTC so it isn't really a
>> bot that is running in the background. As far as I could see it was started
>> successfully every day for the last week, but there were no real logs
>> around what happened. It SHOULD have succeeded as far as I can tell.
>> >
>> > One difference is that the backport "bot" is using backport.pl instead
>> of the Python backport implementation. Don't know if there was a subtle
>> difference in STATUS that caused backport.pl to barf while packport.py
>> succeeded.
>> >
>> > Lets keep our eyes open for the next backport.
>> >
>> > Kind regards,
>> > Daniel
>>
>>
>> Thanks for checking!
>>
>> Based on when upcoming.part.html was last updated, I assume
>> site/tools/upcoming.py is run by another cron job at 04.15 UTC; it
>> looks like I manually merged the backport a little bit after it ran
>> last night, so I'll watch to see if it shows up in upcoming.part.html
>> tonight...
>>
>> Thanks again,
>> Nathan
>>
>
> Upcoming worked well tonight so I guess there might been something in the
> STATUS file that prevented automated backport. If it fails again tonight
> (with the new nominations), I'd like to check running packport.pl manually.

It was probably the «*» at the start of line 2.

To prevent recurrence, options include (1) make the cron job use the .py
implementation; (2) add a regression test to backport_tests.py [sic] and
then fix backport.pl's parsing.

Glad to see backport.py being used :-)

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-03-31 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Wed, 22 Mar 2023 15:23 +00:00:
> This change is still being blocked by a veto, but if danielsh changes his
> mind and if there won't be other objections, I'm ready to complete the few
> remaining bits and merge it to trunk.

What's the question or action item to/for me?  Thanks.

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-06 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Sun, Jan 29, 2023 at 16:37:20 +0300:
> Daniel Shahaf  writes:
> 
> > > (I'm not saying that the above rules have to be used in this particular 
> > > case
> > >  and that a veto is invalid, but still thought it’s worth mentioning.)
> > >
> >
> > I vetoed the change because it hadn't been designed on the dev@ list,
> > had not garnered dev@'s consensus, and was being railroaded through.
> > (as far as I could tell)
> 
> I have *absolutely* no idea where "being railroaded through" comes from.
> Really, it's a wrong way of portraying and thinking about the events that have
> happened so far.
> 
> Reiterating over those events: I wrote an email containing my thoughts
> and explaining the motivation for such change.  I didn't reply to some of
> the questions (including some tricky questions, such as the one featuring
> a theoretical hash function), because they have been at least partly
> answered by others in the thread, and I didn't have anything valuable
> to add at that time.
> 
> During that time, I was actively coding the core part of the change,
> to check if it's possible technically.  Which is important, as far as
> I believe, because not all theoretically possible solutions can be implemented
> without facing significant practical or implementation-related issues, and
> it seems to me that you significantly undervalue such an approach.
> 

Quoting myself from elsethread: [3]

- If the branch is seen and presented as a PoC for furthering discussion
  and for discovering practical considerations (e.g., that
  PRISTINE.MD5_CHECKSUM docstring I found yesterday during discussion,
  or the ra_serf sha1 optimization that anyone implementing the branch
  would run into), it's likely a good thing.

> I do not say my actions were exemplary, but as far as I can tell, they're
> pretty much in line with how svn-dev has been operating so far.  But, it all
> resulted in an unclear veto without any _technical_ arguments, where what's
> being vetoed is unclear as well, because the change was not ready at the
> moment veto got casted.
> 

Look, it's pretty simple.  You said "We should do Y because it
addresses X".  You didn't explain why X needs to be addressed, didn't
consider what alternatives there are to Y, didn't consider any cons that
Y may have… and when people had questions, you just began to
implement Y, without responding to or even acknowledging those
questions.

That's not how design discussions work.  A design discussion doesn't go
"state decision; state pros; implement"; it goes "state problem; discuss
potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

That's why I called veto: not because I considered any particular
proposal then on the table unreasonable, but because I considered /the
decision process being used/ unreasonable (cf. [7]).

> And because your veto goes in favor of a specific process

Yes, I'm arguing in favour of first defining a problem, then considering
solutions to it, both their pros and cons, and only then deciding what
to implement.  This process isn't unique, novel, or singular; it's
standard in multiple disciplines [4–7].

>   (considering that
> no other arguments were given), the only thing that's *actually* being
> railroaded is an odd form of an RTC (review-then-commit) process that is
> against our usual CTR (commit-then-review) [1,2].  That's railroading,
> because it hasn't been explicitly discussed anywhere and a consensus
> on it has not been reached.

This thread was started on 2022-12-20 [1], with the idiomatic
"Thoughts?" sign-off.  The first relevant code was committed on
2023-01-19 [2].

That is: the change followed RTC to begin with.  Considering that both
[1] and [2] were authored by you personally, I find it difficult to
charitably interpret your claim that "an odd form of [RTC]" was being
"railroaded", as RTC rather than "our usual CTR [process]" was being
followed at your own decision.

It's perhaps worth pointing out the veto followed the branch creation
because that was the point when I gave up on waiting for someone to
respond to the objections that had been made by then.  It wasn't a veto
on using a branch, as I have clarified: [3]

I didn't object to the use of a branch /per se/.  I objected to the
treating of objections that *had already been posted* as though they had
never been posted.  *That's* not acceptable.

So, no, I wasn't advocating /either/ RTC or CTR; I was advocating that
the "R" step happen at all.  A branch may take place before, during, or
after discussion — see [3] for more — but the important thing is that
discussion happen.  The OP doesn't have t

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-06 Thread Daniel Shahaf
Karl Fogel wrote on Mon, Jan 30, 2023 at 17:26:03 -0600:
> On 29 Jan 2023, Evgeny Kotkov via dev wrote:
> > I have *absolutely* no idea where "being railroaded through" comes
> > from.  Really, it's a wrong way of portraying and thinking about the
> > events that have happened so far.
> > 
> > Reiterating over those events: I wrote an email containing my
> > thoughts and explaining the motivation for such change.  I didn't
> > reply to some of the questions (including some tricky questions,
> > such as the one featuring a theoretical hash function), because they
> > have been at least partly answered by others in the thread, and I
> > didn't have anything valuable to add at that time.
> > 
> > During that time, I was actively coding the core part of the change,
> > to check if it's possible technically.  Which is important, as far
> > as I believe, because not all theoretically possible solutions can
> > be implemented without facing significant practical or
> > implementation-related issues, and it seems to me that you
> > significantly undervalue such an approach.
> > 
> > I do not say my actions were exemplary, but as far as I can tell,
> > they're pretty much in line with how svn-dev has been operating so
> > far. But, it all resulted in an unclear veto without any _technical_
> > arguments, where what's being vetoed is unclear as well, because the
> > change was not ready at the moment veto got casted.
> > 
> > And because your veto goes in favor of a specific process
> > (considering that no other arguments were given), the only thing
> > that's *actually* being railroaded is an odd form of an RTC
> > (review-then-commit) process that is against our usual CTR
> > (commit-then-review) [1,2].  That's railroading, because it hasn't
> > been explicitly discussed anywhere and a consensus on it has not
> > been reached.
> 
> Daniel, given what's in Evgeny's branch now, could you summarize your
> current technical objections if any?
> 
> If they are something like "This code is solving the wrong problem(s)" or
> "I'm not sure what problem(s) it's supposed to solve", those count as
> technical objections.  It's just that it would be useful to have the
> objection(s) gathered in one place. This thread has been long and somewhat
> digressive -- I'm not saying that's due to you -- and I at least have found
> it a bit difficult to keep track of the concrete objections versus various
> interesting but ultimately theoretical points.
> 

Quoting my other reply just now:

[…] it's pretty simple.  [The OP] said "We should do Y because it
addresses X".  [The OP] didn't explain why X needs to be addressed, didn't
consider what alternatives there are to Y, didn't consider any cons that
Y may have… and when people had questions, [the OP] just began to
implement Y, without responding to or even acknowledging those
questions.

That's not how design discussions work.  A design discussion doesn't go
"state decision; state pros; implement"; it goes "state problem; discuss
potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

That's why I called veto: not because I considered any particular
proposal then on the table unreasonable, but because I considered /the
decision process being used/ unreasonable (cf. [7]).

Concretely: Why would migrating away from SHA-1 be a good thing in the
first place?  Assuming that it /would/ be a good thing, what alternative
ways are there to achieve whatever the goodness may be (new feature /
bugfix / resilience to some attack vector / etc.)?  What are the
potential *downsides* of migrating away from SHA-1?

The same, restated at a higher level of abstraction: "Migrate
away from SHA-1" is a means, not an end.  Define the ends and have
a non-predetermined-outcome discussion on how to achieve them.

"Reduce the security impact to our users of second-preimage attacks
against SHA-1" would be an end.  I don't know whether it's the only one
or whether there are additional ones.

[As to the branch, I'm not sure whether to restate my position on it or
not — so I'll restate it, erring on the side of including too much
rather than too little, but feel free to ignore the following paragraph
at will:]

Was the branch commenced as a PoC / smoke test, to explore one proposed
direction and to be discarded if the consensus compass should end up
pointing towards another cardinal direction?  Or was it commenced on the
assumption that consensus on migrating to SHA-1 to SHA-256 went without
saying, had already formed, or would necessarily have formed by 1.15.0-rc1?

> The reason I'm supportive of Evgeny's direction is that his changes, if
> completed, would offer a solution to the (admittedly still somewhat distant)
> security concern I raised early on. Essentially, I'm worried that
> second-preimage attacks on SHA-1 are coming eventually (maybe I'm wrong
> about this -- they are after all significantly harder than mere collision
> attacks).  

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-06 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Sun, Jan 29, 2023 at 16:36:12 +0300:
> Daniel Shahaf  writes:
> 
> > > That could happen after a public disclosure of a pair of executable
> > > files/scripts where the forged version allows for remote code execution.
> > > Or maybe something similar with a file format that is often stored in
> > > repositories and that can be executed or used by a build script, etc.
> > >
> >
> > Err, hang on.  Your reference described a chosen-prefix attack, while
> > this scenario concerns a single public collision.  These are two
> > different things.
> 
> A chosen-prefix attack allows finding more meaningful collisions such as
> working executables/scripts.  When such collisions are made public, they
> would have a greater exploitation potential than just a random collision.
> 

Right.  So we're assuming Mallory generates a chosen-prefix collision,
and then somehow pulls off steps #1 and #2-as-amended [both quoted
below], with Alice noticing none of that.

That still sounds like something we should assume Mallory can pull off.

> > Disclosure of of a pair of executable files/scripts isn't by itself
> > a problem unless one of the pair ("file A") is in a repository
> > somewhere.  Now, was the colliding file ("file B") generated _before_ or
> > _after_ file A was committed?
> >
> > - If _before_, then it would seem Mallory had somehow managed to:
> >
> >   1. get a file of his choosing committed to Alice's repository; and
> >
> >   2. get a wc of Alice's repository into one of the codepaths that
> >  assume SHA-1 is one-to-one / collission-free (currently that's the
> >  ra_serf optimization and the 1.15 wc status).
> 
> Not only.  There are cases when the working copy itself installs the working
> file with a hash lookup in the pristine store.  This is more true for 1.14
> than trunk, because in trunk we have the streamy checkout/update that avoid
> such lookups by writing straight to the working file.  However, some of
> the code paths still install the contents from the pristine store by hash.
> Examples include reverting a file, copying an unmodified file, switching
> a file with keywords, the mentioned ra_serf optimization, and etc.
> 

Thanks.  In terms of that step #2, all these are also candidates for
"one of the codepaths", then.

> >   Now, step #1 seems plausible enough.  As to step #2, it's not clear to
> >   me how file B would reach the wc in step #2…
> 
> If Mallory has write access, she could commit both files, thus arranging for
> a possible content change if both files are checked out to a single working
> copy.  This isn't the same as just directly modifying the target file, because
> file content isn't expected to change due to changes in other files (that can
> be of any type), so this attack has much better chances of being unnoticed.
> 

Well, yes, but the write access requirement lowers severity.

> If Mallory doesn't have write access, there should be other vectors, such
> as distributing a pair of files (harmless in the context of their respective
> file formats) separately via two upstream channels.  Then, if both of the
> upstream distributions are committed into a repository and their files are
> checked out together, the content will change, allowing for a malicious
> action.

I take it we're still under the assumption that someone's repository has
rep-sharing disabled (or unsupported, i.e., pre-1.6 format) despite the
recommendation in security/sha1-advisory.txt, since otherwise the commit
would be rejected.

So, back to my question which you have snipped:

> >   So, I agree it's a scenario we should address.  What options do we
> >   have to address it?  (I grant that migrating away from SHA-1 is one
> >   option.)

Care to address that?

Daniel

> 
> Regards,
> Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-31 Thread Daniel Shahaf
Karl Fogel wrote on Mon, 30 Jan 2023 23:26 +00:00:
> Daniel, given what's in Evgeny's branch now, could you summarize 
> your current technical objections if any?

Certainly, but I won't have time to do so today.


Glossary of attacks (was: Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format)

2023-01-26 Thread Daniel Shahaf
Definitions of attacks:

1. Collision attack:
   Given h(),
   find x₁, x₂ such that h(x₁) == h(x₂).

2. Second preimage attack:
   Given h() and x,
   find x′ such that h(x) == h(x′).

3. First preimage attack:
   Given h() and y,
   find x such that h(x) == y.

4. Chosen prefix attack:
   Given h(), p₁, and p₂,
   find m₁, m₂ such that h(m₁) == h(m₂) and m₁.startswith(p₁) and 
m₂.startswith(p₂).

Daniel Shahaf wrote on Thu, Jan 26, 2023 at 09:33:59 +:
> Evgeny Kotkov via dev wrote on Mon, Jan 23, 2023 at 02:28:50 +0300:
> > However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's
> > probably only a matter of time until the situation becomes worse.
> > 
> 
> Quoting the third hunk of 
> <https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C20221220201300.GH32332%40tarpaulin.shahaf.local2%3E>:
> 
> What's the acceptance test we use for candidate checksum algorithms?
> 
> You say we should switch to a checksum algorithm that doesn't have known
> collisions, but, why should we require that?  Consider the following
> 160-bit checksum algorithm:
> .
> 1. If the input consists of 40 ASCII lowercase hex digits and
>nothing else, return the input.
> 2. Else, return the SHA-1 of the input.
> 
> This algorithm has a trivial first preimage attack.  If a wc used this
> identity-then-sha1 algorithm instead of SHA-1, then… what?
> 
> > That could happen after a public disclosure of a pair of executable
> > files/scripts where the forged version allows for remote code execution.
> > Or maybe something similar with a file format that is often stored in
> > repositories and that can be executed or used by a build script, etc.
> > 
> 
> Err, hang on.  Your reference described a chosen-prefix attack, while
> this scenario concerns a single public collision.  These are two
> different things.
> 
> Disclosure of of a pair of executable files/scripts isn't by itself
> a problem unless one of the pair ("file A") is in a repository
> somewhere.  Now, was the colliding file ("file B") generated _before_ or
> _after_ file A was committed?
> 
> - If _before_, then it would seem Mallory had somehow managed to:
> 
>   1. get a file of his choosing committed to Alice's repository; and
> 
>   2. get a wc of Alice's repository into one of the codepaths that
>  assume SHA-1 is one-to-one / collission-free (currently that's the
>  ra_serf optimization and the 1.15 wc status).
> 
>   Now, step #1 seems plausible enough.  As to step #2, it's not clear to
>   me how file B would reach the wc in step #2… but insofar as security
>   assumptions go, it seems reasonable to assume Mallory can make this
>   happen.
> 
>   So, I agree it's a scenario we should address.  What options do we
>   have to address it?  (I grant that migrating away from SHA-1 is one
>   option.)
> 
> - If _after_, then you're presuming not simply a collision attack but
>   a second preimage attack.  Should we assume Mallory to be able to
>   mount a second preimage attack?
> 
> Chosen-prefix collision attacks can help Mallory in a variant of the
> "before" case: Mallory computes a collision, sends file A to Alice (who
> commits it), and invokes his assumed ability to inject file B into
> Alice's wc.  This would work for file formats that ignore the unchosen
> suffix.


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-26 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Mon, Jan 23, 2023 at 02:28:50 +0300:
> Daniel Shahaf  writes:
> 
> > > I can complete the work on this branch and bring it to a production-ready
> > > state, assuming there are no objections.
> >
> > Your assumption is counterfactual:
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> 
> I don't see any explicit objections in these two emails (here I assume that
> if something is not clear to a PMC member, it doesn't automatically become
> an objection).  If the "why?" question is indeed an objection, then I would
> say it has already been discussed and responded to in the thread.
> 

The "Why?" was sent _after_ the post you're quoting, and in any case was
just an elevator pitch summary of something I had explained more verbosely.

The first post in this thread asserts X is a problem and Y is a solution
to it, and argues that Y is a good thing.  However, that post does not
explain /why/ X is a problem, does not consider alternatives to Y, and
does not consider possible cons of Y.  That's what's missing.

> Now, returning to the problem:
> 
> As described in the advisory [1], we have a supported configuration that
> makes data forgery possible:
> 
> - A repository with disabled rep-sharing allows storing different files with
>   colliding SHA-1 values.
> - Having a repository with disabled rep-sharing is a supported configuration.
>   There may be a certain number of such repositories in the wild
>   (for example, created with SVN < 1.6 and not upgraded afterwise).
> - A working copy uses an assumption that the pristine contents are equal if
>   their SHA-1 hashes are equal.
> - So committing different files with colliding SHA-1 values makes it possible
>   to forge the contents of a file that will be checked-out and used by the
>   client.
> 
> I would say that this state is worrying just by itself.
> 

I assume this situation could happen accidentally, say, if someone adds
shattered-1.pdf and shattered-2.pdf to the same wc in a particular way.
That is, I'm not assuming "forgery" (which implies Mallory is involved).

Still, this is a potential data integrity issue with the new-in-1.15 wc
format, so we should address it before the release.  What are our
options to address that?  Switching to another checksum is an option,
yes, but we [as in, dev@] don't seem to have considered any alternatives
to that.

Just off the top of my head, we could:

- Encourage or require use of rep-sharing
  [the advisory already recommends this]

- Encourage or require use of 
tools/hook-scripts/reject-detected-sha1-collisions.sh
  [the advisory already recommends this]

- Have f32 wc's refuse to talk to servers that don't detect SHA-1
  collisions.  (1.15 users will still be able to interoperate with old
  servers by using f31.)

And there may be more options.  (Lurkers are invited to speak up!)

> However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's
> probably only a matter of time until the situation becomes worse.
> 

Quoting the third hunk of 
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C20221220201300.GH32332%40tarpaulin.shahaf.local2%3E>:

What's the acceptance test we use for candidate checksum algorithms?

You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that?  Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
   nothing else, return the input.
2. Else, return the SHA-1 of the input.

This algorithm has a trivial first preimage attack.  If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?

> That could happen after a public disclosure of a pair of executable
> files/scripts where the forged version allows for remote code execution.
> Or maybe something similar with a file format that is often stored in
> repositories and that can be executed or used by a build script, etc.
> 

Err, hang on.  Your reference described a chosen-prefix attack, while
this scenario concerns a single public collision.  These are two
different things.

Disclosure of of a pair of executable files/scripts isn't by itself
a problem unless one of the pair ("file A") is in a repository
somewhere.  Now, was the colliding file ("file B") generated _before_ or
_after_ file A was committed?

- If _before_, then it would seem Mallory had somehow managed to:

  1. get a file of his choosing committed to Alice's repository; a

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Daniel Shahaf
[ tl;dr: See last paragraph for a concrete question about ra_serf. ]

Karl Fogel wrote on Fri, 20 Jan 2023 17:18 +00:00:
> Yes.  A hash is considered "broken" the moment security researches 
> can generate a collision.

Consider the following uses of hash functions in our code:

- FSFS rep-cache uses SHA-1.

- The ra_serf download optimization uses SHA-1.

- The commit editor uses MD5 in apply_textdelta() and close_file().

The first one is fine, because FSFS rejects collisions in new commits
(as pointed out upthread).

The second one is not necessarily fine: a variation of the attack you (kfogel)
described could make a client wrongly trigger the optimization and end
up with the wrong fulltext.

The third one is fine, because the delta and its resulting fulltext's
checksum don't travel separately.

So, there you have it: a use of SHA-1 which can stay as-is, a use of SHA-1
which may need attention, and a use of MD5 which can stay as-is — all
in the same codebase.

Thus, whether a hash function is "broken" or not depends on the context
in which it is used.



To be clear, the ra_serf thing which "may need attention" is the use
of «final_sha1_checksum» in subversion/libsvn_ra_serf/update.c.  That's
a place where we assume SHA-1 is one-to-one.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Daniel Shahaf
[See below a proposal that libsvn_wc not use any fixed hash function.]

Martin Edgar Furter Rathod wrote on Sat, 21 Jan 2023 05:22 +00:00:
> On 20.01.23 22:48, Karl Fogel wrote:
>> On 20 Jan 2023, Nathan Hartman wrote:
>>> We already can't store files with identical SHA1 hashes, but AFAIK the
>>> only meaningful impact we've ever heard is that security researchers
>>> cannot track files they generate with deliberate collisions. The same
>>> would be true with any hash type, for collisions within that hash
>>> type.
>> 
>> Yes.  A hash is considered "broken" the moment security researches can 
>> generate a collision.
>
> No matter what hash function you choose now, sooner or later it will be 
> broken.
>
> But a broken hash function can still be good enough for use in tools 
> like subversion if it is used correctly. Instead of just storing the 
> hash value subversion should also store a sequence number. Whenever a 
> collision happens subversion has to compare the two (or more) files 
> which have the same hash value.

So, basically, just do what the implementation of hashes (the data
structure mapping keys to values) does?

I think this would work in most of our uses of checksums, and make it
possible to have collisions in both the repository and the wc.

However, what about running `svn status` when there's an unhydrated file
that has been modified in a way that changes the fulltext but doesn't
change the checksum value?  In this case the BASE fulltext isn't
available locally to compare with.



I think there is actually something we can do about this: stop
hardcoding any particular hash function in libsvn_wc's internals.

The server is aware of what algorithm the wc uses on the wire, which is
SHA-1 in ra_serf's download optimization and MD5 in 
svn_delta_editor_t::apply_textdelta()
and svn_delta_editor_t::close_file().  However, the algorithm(s) used by
the wc for naming pristines and, in f32, for detecting local mods are
implementation details of the wc.

So, suppose the wc didn't hardcode _any particular_ hash function for
naming pristines and for status walks — not md5, not sha1, not sha256 —
but had each «svn checkout» run pick a hash function uniformly at random
out of a large enough family of hash functions[1].  (Intuitively, think
of a family of hash functions as a hash function with a random salt,
similar to [2].)

This way, even if someone tried to deliberately create a collision, they
wouldn't be able to pick a collision "off the shelf", as with
shattered.io; they'd need to compute a collision for the specific hash
function ("salt") used by that particular wc.  That's more difficult than
creating a collision in a well-known hash function, regardless of
whether we treat the salt's value as a secret of the wc (as in, stored
in a mode-0400 file in under .svn directory and not disclosed to the
server) or as a value the attacker is assumed to know.

So, that's one way to address the scenario kfogel described.

Thanks for speaking up, Martin.

Daniel

[1] I'm not making this term up; see, for instance, page 143 of
https://cseweb.ucsd.edu/~mihir/papers/gb.pdf.  "풦" is keyspace,
"D" is domain, "R" is range.  A random element K ∈ 풦 is chosen and the
hash function H_K [aka H with currying of the first parameter] is
used thereafter.

[2]
def f(foo):
return sha1(str(foo) + f.salt)
f.salt = str(random_thing())

> If the files are identical the old 
> hash+number pair is stored. If they differ the new file gets a new 
> sequence number and that hash+number pair is stored. Since collisions 
> almost never happen even if md5 is used the performance penalty will be 
> almost zero.
>
> The same thing has been discussed earlier and changing the hash function 
> will just solve the problem for a few years...
>
> Best regards,
> Martin


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Daniel Shahaf
To be clear, I wasn't vetoing changing the hash algorithm.  I was
vetoing making a change without discussion.  If there is discussion and
it results in consensus to change the algorithm, that'll be absolutely
fine by me.

Daniel

Karl Fogel wrote on Sat, 21 Jan 2023 17:58 +00:00:
> *nod* This issue isn't important enough to me to continue the 
> conversation -- I'd like for new hash algorithms to be possible, 
> and I think Evgeny's work on it is worthwhile, but I don't feel 
> nearly as strongly about this as I feel about making the new 
> pristineless working copies available in an official release as 
> soon as we can.
>
> Best regards,
> -Karl
>
> On 21 Jan 2023, Daniel Shahaf wrote:
>>Karl Fogel wrote on Fri, Jan 20, 2023 at 11:09:11 -0600:
>>> On 20 Jan 2023, Daniel Shahaf wrote:
>>> > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
>>> > > I can complete the work on this branch and bring it to a
>>> > > production-ready
>>> > > state, assuming there are no objections.
>>> > 
>>> > Your assumption is counterfactual:
>>> > 
>>> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>>> > 
>>> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>>> > 
>>> > Objections have been raised, been left unanswered, and now
>>> > implementation work has commenced following the original 
>>> > design. That's
>>> > not acceptable.
>>> 
>>> I'm a little surprised by your reaction.
>>> 
>>> It is never "not acceptable" for someone to do implementation 
>>> work on a
>>> branch while a discussion is happening, even if that discussion 
>>> contains
>>> objections to or questions about the premise of the branch 
>>> work.
>>> 
>>> It's a branch.  He didn't merge it to trunk, and he posted it 
>>> as an explicit
>>> invitation for discussion.
>>> 
>>
>>I didn't object to the use of a branch /per se/.  I objected to 
>>the
>>treating of objections that *had already been posted* as though 
>>they had
>>never been posted.  *That's* not acceptable.
>>
>>However, since you ask, I don't think implementing a proposal on
>>a branch is necessarily a good idea:
>>
>>- If the branch is seen and presented as a PoC for furthering 
>>discussion
>>  and for discovering practical considerations (e.g., that
>>  PRISTINE.MD5_CHECKSUM docstring I found yesterday during 
>>  discussion,
>>  or the ra_serf sha1 optimization that anyone implementing the 
>>  branch
>>  would run into), it's likely a good thing.
>>  
>>- On the other hand, when the branch implements the original 
>>proposal,
>>  whilst outstanding questions were not only not answered but 
>>  also not
>>  acknowledged, that's quite another thing.  It can result in:
>>
>>  + The branch maintainer being biased in favour of the approach 
>>  they
>>have implemented.  (People tend not to argue against what 
>>they have
>>expended resources on.  Cf. plan continuation bias, sunk cost
>>fallacy.)
>>
>>  + dev@ being biased towards the approach that has been 
>>  implemented
>>(because it's a known entity; because no one is volunteering 
>>to
>>implement another approach; because there's a desire to cut
>>a minor release soon…).  This, in turn, can result in…
>>  
>>  + …an incentive for participants *not* to hold open design
>>discussions on dev@ in the first place.
>>
>>> > I'm vetoing the change until a non-rubber-stamp design
>>> > discussion has been completed on the public dev@ list.
>>> 
>>> Starting an implementation on a branch is a valuable 
>>> contribution to a
>>> design discussion -- it's exactly the kind of 
>>> "non-rubber-stamp"
>>> contribution one would want.
>>> 
>>
>>You're just repeating what you said above.
>>
>>> If you want to re-iterate points you've made that have been 
>>> left unanswered,
>>> that would be a useful contribution -- perhaps some of those 
>>> points will be
>>> updated now that there's actual code, or perhaps they won't. 
>>> Either way,
>>> what Evgeny is doing here seems very constructive to me, and 
>>> entirely within
>>> the normal range of how we do things.
>>
>>Posting a paragraph such as the one I'm replying to is not 
>>"entirely
>>within the normal range of how we do things".  As to my points, 
>>see
>><https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E>.
>>They boil down to this:
>>
>> We should migrate away from SHA-1.
>> Why?
>>
>>Daniel
>>
>>> Best regards,
>>> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-21 Thread Daniel Shahaf
Karl Fogel wrote on Fri, Jan 20, 2023 at 11:18:56 -0600:
> On 20 Jan 2023, Nathan Hartman wrote:
> > Taking a step back, this discussion started because pristine-free WCs
> > are IIUC more dependent on comparing hashes than pristineful WCs, and
> > therefore a hash collision could have more impact in a pristine-free
> > WC. "Guarantees" were mentioned, but I think it's important to state
> > that there's only a guarantee of probability, since as mentioned above
> > all hashes will have collisions.
> 
> Sure, in a literal mathematical sense, but not in a sense that matters for
> our purposes here.
> 
> In the absence of an intentionally caused collision, a good hash function
> has *far* less chance of accidental collision than, say, the chance that
> your CPU will malfunction due to a stray cosmic ray, or the chance of us
> getting hit by a planet-destroying meteorite tomorrow.
> 
> For our purposes, "guarantee" is accurate.  No guarantee we make can be
> stonger than the inverse probability of a CPU/memory malfunction anyway.
> 

The probability of an accidental collision in a "good" N-bit hash
function is on the order of 1/√2ⁿ, which for sufficiently large N is
considered an acceptable risk.  That's invariant over time, however,
intentionally causing collisions becomes easier over time.

> > We already can't store files with identical SHA1 hashes, but AFAIK the
> > only meaningful impact we've ever heard is that security researchers
> > cannot track files they generate with deliberate collisions. The same
> > would be true with any hash type, for collisions within that hash
> > type.
> 
> Yes.  A hash is considered "broken" the moment security researches can
> generate a collision.
> 

To be clear, is this what you're saying? —
.
Premise: There is a collision attack against SHA-1.
Conclusion: Subversion should stop using SHA-1.

This conclusion does not follow from this premise.  For instance, FSFS
checks for collisions, so it can actually use "File length in bytes" as
a checksum and everything would work; the only thing that would change
is that it would not be possible to commit a file that's the same
expanded_size as any other node-rev (including directories).

And, anyway, the burden is not on me to disprove your claim, but on
you to prove it.

> FWIW, in one of my previous posts, I described a real-life scenario in which
> the ability to generate a chosen-plaintext collision in an SVN working copy
> would have security implications.

Yes, and as I have already asked: What other counters to that attack,
besides migrating away from SHA-1, have you considered?  Have you
considered the downsides of migrating away from SHA-1?

Also, /if/ we changed checksums, would that address the attack?  Put
differently, why is a similar attack impossible if we change the
checksum algorithm?  Why is use of SHA-1 a /sine qua non/ of your
scenario?

For example, if we used another checksum algorithm, the attacker from
your scenario might opt to edit the base checksums in .svn/wc.db and
rename the .svn/pristine/ files accordingly.  That's much easier to pull
off, and will be easy to adapt if we change the algorithm again, but on
the other hand, requires write access to the .svn directory and is
easier to discover.

Daniel

> Best regards,
> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-21 Thread Daniel Shahaf
Karl Fogel wrote on Fri, Jan 20, 2023 at 11:09:11 -0600:
> On 20 Jan 2023, Daniel Shahaf wrote:
> > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > > I can complete the work on this branch and bring it to a
> > > production-ready
> > > state, assuming there are no objections.
> > 
> > Your assumption is counterfactual:
> > 
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> > 
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> > 
> > Objections have been raised, been left unanswered, and now
> > implementation work has commenced following the original design. That's
> > not acceptable.
> 
> I'm a little surprised by your reaction.
> 
> It is never "not acceptable" for someone to do implementation work on a
> branch while a discussion is happening, even if that discussion contains
> objections to or questions about the premise of the branch work.
> 
> It's a branch.  He didn't merge it to trunk, and he posted it as an explicit
> invitation for discussion.
> 

I didn't object to the use of a branch /per se/.  I objected to the
treating of objections that *had already been posted* as though they had
never been posted.  *That's* not acceptable.

However, since you ask, I don't think implementing a proposal on
a branch is necessarily a good idea:

- If the branch is seen and presented as a PoC for furthering discussion
  and for discovering practical considerations (e.g., that
  PRISTINE.MD5_CHECKSUM docstring I found yesterday during discussion,
  or the ra_serf sha1 optimization that anyone implementing the branch
  would run into), it's likely a good thing.
  
- On the other hand, when the branch implements the original proposal,
  whilst outstanding questions were not only not answered but also not
  acknowledged, that's quite another thing.  It can result in:

  + The branch maintainer being biased in favour of the approach they
have implemented.  (People tend not to argue against what they have
expended resources on.  Cf. plan continuation bias, sunk cost
fallacy.)

  + dev@ being biased towards the approach that has been implemented
(because it's a known entity; because no one is volunteering to
implement another approach; because there's a desire to cut
a minor release soon…).  This, in turn, can result in…
  
  + …an incentive for participants *not* to hold open design
discussions on dev@ in the first place.

> > I'm vetoing the change until a non-rubber-stamp design
> > discussion has been completed on the public dev@ list.
> 
> Starting an implementation on a branch is a valuable contribution to a
> design discussion -- it's exactly the kind of "non-rubber-stamp"
> contribution one would want.
> 

You're just repeating what you said above.

> If you want to re-iterate points you've made that have been left unanswered,
> that would be a useful contribution -- perhaps some of those points will be
> updated now that there's actual code, or perhaps they won't.  Either way,
> what Evgeny is doing here seems very constructive to me, and entirely within
> the normal range of how we do things.

Posting a paragraph such as the one I'm replying to is not "entirely
within the normal range of how we do things".  As to my points, see
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E>.
They boil down to this:

 We should migrate away from SHA-1.
 Why?

Daniel

> Best regards,
> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Daniel Shahaf
Nathan Hartman wrote on Fri, 20 Jan 2023 14:51 +00:00:
> 1. Pros/cons of switching from SHA1 to another hash.
⋮
> Do we need to switch from SHA1 to another hash? One con that was
> already mentioned [1] is that we'll never really be able to switch
> away from SHA1, as there are existing clients, servers, and working
> copies out there. Not only will we have to support SHA1 forever for
> backwards compatibility,

Actually, I think it's MD5, not SHA-1, that we have to support
indefinitely, since our uses of SHA-1 fall into two categories:

- Accompanied by MD5.  (wc.db PRISTINE table, FSFS node-rev headers,
  dumpfiles' Text-content-* headers)

- An optional optimization.  (ra_serf, rep-cache.db)

>  but any new hash that is ever added will need
> to be supported forever as well. If we accumulate many of those, it
> might become a burden,

Good point.  Then perhaps we should continue to record two checksums, as
both wc.db and FSFS do?  If we record, say, both «(svn_checksum_kind_t)42»
checksums and «(svn_checksum_kind_t)value_of_the_month» checksums, then
we'll only need to be able to upgrade from the former.

>but perhaps there will be only one new hash and
> it will be the "blessed" one for the next 20 years.

Cheers,

Daniel

P.S.  wc-metadata.sql implies that having MD5 collisions in a wc is supported:

 1  /* wc-metadata.sql -- schema used in the wc-metadata SQLite database
 2   * This is intended for use with SQLite 3
 ⋮
94  CREATE TABLE PRISTINE (
95/* The SHA-1 checksum of the pristine text. This is a unique key. The
96   SHA-1 checksum of a pristine text is assumed to be unique among all
97   pristine texts referenced from this database. */
98checksum  TEXT NOT NULL PRIMARY KEY,
99  
 ⋮
   114/* Alternative MD5 checksum used for communicating with older
   115   repositories. Not strictly guaranteed to be unique among table 
rows. */
   116md5_checksum  TEXT NOT NULL
   117);
   118  
   119  CREATE INDEX I_PRISTINE_MD5 ON PRISTINE (md5_checksum);


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> I can complete the work on this branch and bring it to a production-ready
> state, assuming there are no objections.

Your assumption is counterfactual:

https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E

https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E

Objections have been raised, been left unanswered, and now
implementation work has commenced following the original design.  That's
not acceptable.  I'm vetoing the change until a non-rubber-stamp design
discussion has been completed on the public dev@ list.

Daniel


Re: Escape sequences in log messages [etc]

2023-01-19 Thread Daniel Shahaf
Nathan Hartman wrote on Wed, Jan 18, 2023 at 01:10:47 -0500:
> On Tue, Jan 17, 2023 at 3:02 PM Doug Robinson 
> wrote:
> 
> > Daniel, et. al.:
> >
> > On Mon, Jan 2, 2023 at 5:14 PM Daniel Sahlberg <
> > daniel.l.sahlb...@gmail.com> wrote:
> >
> >> In a thread started by Vincent Lefevre in October [1] it was noted that
> >> Subversion prints several pieces of information from the repository to the
> >> terminal (including log messages and author names) without considering if
> >> they may affect terminal behaviour.
> >>
> >> As demonstrated by DanielSh [2] a user may inject escape sequences into a
> >> log message and when running svn log, these affect terminal color. Git
> >> behaves the same way, as demonstrated by me [3].
> >>
> >
> > Any idea what Git is going to do with this?
> >
> 
> 
> Unless someone reports (reported?) it to the Git devs, it's possible they
> aren't aware of it.
> 
> If we want to do something about it on our end, it might make sense to
> coordinate with the Git devs so that both systems could have similar
> behavior.
> 
> But... I'm not sure whether we want to do anything yet, partly because...
> 
> 
> Can we reach consensus if this behaviour is intended, unintended but
> >> desirable or unintended and undesirable? I would value the opinions of the
> >> oldtimers who might have background information if this was ever discussed
> >> or considered in the early days.
> >>
> >> In the original thread there were several arguments both pro and con
> >> regarding filtering/quoting escape sequences.
> >>
> >
> > From my perspective trying to do anything about this is opening up a huge
> > investigation that may result in incompatible-with-history choices.
> >
> > 1. What about "svn diff" ?  (any modifications here could break "patch",
> > et. al.)
> > 2. What about "svn cat" ?
> > 3. What about properties?  (I just verified you can place escape sequences
> > in them).
> > ...
> >
> > (I doubt my list above is complete.)
> >
> 
> 
> ...of concerns that doing so will break stuff.
> 

We have precedents for making breaking changes: we make them in an A.B.0
release if possible, and document them in the release notes and/or in
notes/api-errata/.

> > A "complete" implementation of a "feature" to mask/protect-against escape
> > sequences is also going to need an option to enable the raw output
> > (including the escape sequences) for every command/context where they could
> > be coming out today.

Not necessarily: If the lack of escaping may be considered a bugfix,
then we don't have to offer a backwards-compatible upgrade path.

The values of log messages are required to be UTF-8 strings.  When
ViewVC renders them, it escapes characters that are special to HTML
(angle brackets and ampersand).  If someone out there has a C source
code generator that takes svn log messages as input, that generator
should escape double quotes, backslashes, and so on when it emits C
string literals derived from log messages.  And when the cmdline client
emits svn:log property values to a terminal, it should escape the
sequences as appropriate for that terminal.

And we needn't support an option to emit escape sequences in raw form
for the same reason that ViewVC doesn't have an option to emit log
messages into the HTTP response stream without HTML escaping.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Daniel Shahaf
Karl Fogel wrote on Thu, Dec 29, 2022 at 17:35:44 -0600:
> On 29 Dec 2022, Evgeny Kotkov wrote:
> > Karl Fogel  writes:
> > 
> > > Now, how hard would this be to actually implement?
> > 
> > I plan to take a more detailed look at that, but I'm currently on
> > vacation for the New Year holidays.
> 
> That's great to hear, Evgeny.  In the meantime, enjoy your vacation!

Any news on this?  Over here it's still not clear to me why what problem
would be solved by switching away from SHA-1, what alternative solutions
to that problem have been considered, and whether anyone has actually
stopped to consider /both/ the pros and cons of switching away from SHA-1.

Karl Fogel wrote on Wed, Dec 28, 2022 at 09:10:31 -0400:
> On 28 Dec 2022, Daniel Sahlberg wrote:
> > Since we need to be backwards compatible with older v1 clients, can
> > this check ever be removed (before Subversion 2)?
> > 
> > So, while I believe f32 is a good opportunity to switch to a new
> > hash, what is the problem we would like to solve with a new hash?
> 
> As I said before, even if we couldn't think of a concrete problem right now,
> the mere fact that a former guarantee [1] has become a non-guarantee is
> enough motivation.  We can't anticipate all the problems that might arise
> from people being able to craft local content that looks unmodified to
> Subversion.  (As you implied, r1794611 has no effect for content that is
> never committed to the repository.)
> 
> Of course, my saying "This matters just through reasoning from first
> principles, therefore we should fix it" would count for a lot more if I were
> volunteering to fix it, which I'm not alas. But I do think we don't need to
> search further for justifications. What we already know is enough: our hash
> algorithm is known to be collidable, yet what we're using it for depends on
> non-collidability; therefore, switching to a better algorithm is a good
> idea.
> 

Agreed that we shouldn't limit ourselves to problems/attacks we can
imagine.

However, it does not follow from "the mere fact that a former guarantee
has become a non-guarantee" that we should switch the checksum
algorithm.  What does folow from that is that we should review our
design, identify the places that depend on the no-longer-valid
guarantee, assess the implications for each of them, and then determine
what sort of changes may be needed.

In other words, we should do what we do whenever we write an advisory.

Which reminds me:

https://subversion.apache.org/security/sha1-advisory.txt

Daniel

> However, it needn't be a blocker for the next release, for the reason Brane
> gave.
> 
> Best regards,
> -Karl
> 
> [1] "Former guarantee" meaning "former guarantee for all practical
> purposes", of course, since in the past there weren't ways to make
> collisions happen.


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Tue, Dec 20, 2022 at 11:14:00 +0300:
> [Moving discussion to a new thread]
> 
> We currently have a problem that a working copy relies on the checksum type
> with known collisions (SHA1).  A solution to that problem

Why is libsvn_wc's use of SHA-1 a problem?  What's the scenario wherein
Subversion will behave differently than it should?

> is to switch to a different checksum type without known collisions in
> one of the newer working copy formats.

Such as SHA-1 salted by NODES.LOCAL_RELPATH and NODES.WC_ID (or a per-wc UUID)?

> Since we plan on shipping a new working copy format in 1.15, this seems to
> be an appropriate moment of time to decide whether we'd also want to switch
> to a checksum type without known collisions in that new format.
> 

What's the acceptance test we use for candidate checksum algorithms?

You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that?  Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
   nothing else, return the input.
2. Else, return the SHA-1 of the input.

This algorithm has a trivial first preimage attack.  If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?

> Below are the arguments for including a switch to a different checksum type
> in the working copy format for 1.15:
> 
> 1) Since the "is the file modified?" check now compares checksums, leaving
>everything as-is may be considered a regression, because it would
>introduce additional cases where a working copy currently relies on
>comparing checksums with known collisions.
> 

Well, SHA-1 is still collision-free so long as one is not deliberately
trying to use collisions, so this would only be a regression if we
consider "Deliberately store files that have the same checksum" to be
a use-case.  Do we?

I recall we discussed this when shattered.io was announced, and we
didn't rush to upgrade the checksums we use everywhere, so I guess back
then we came to the conclusion that wasn't a use-case.  (Of course we
can change our opinion; that's just a datapoint, and there may be more,
on both sides, in the old thread.)

I looked for the old thread and didn't find it.  (I looked in the
private@ archives too in case the thread was there.)

> 2) We already need a working copy format bump for the pristines-on-demand
>feature.  So using that format bump to solve the SHA1 issue might reduce
>the overall number of required bumps for users (assuming that we'll still
>need to switch from SHA1 at some point later).
> 

Considering that 1.15 will support reading and writing both f31 and f32,
the "overall number of required bumps" between 1.8 and trunk@HEAD is
zero, meaning the proposed change can't reduce that number.

> 3) While the pristines-on-demand feature is not released, upgrading
>with a switch to the new checksum type seems to be possible without
>requiring a network fetch.

I infer the scenario in question here is upgrading a (say) pristinesless
wc to a a newer format that supports a new checksum algorithm.

>But if some of the pristines are optional, we lose the possibility
>to rehash all contents in place.  So we might find ourselves having
>to choose between two worse alternatives of either requiring
>a network fetch during upgrade or entirely prohibiting an upgrade
>of working copies with optional pristines.

Why would we want to rehash everything in place?  The 1.15→1.16 upgrade
could simply leave pristineless files' checksums as SHA-1 until the next
«svn up», just like «svnadmin upgrade» of FSFS doesn't retroactively add
SHA-1 checksums to node-rev headers or "-file" or "-dir" indicators in
the changed-paths section.

There may be yet other alternatives.

> Thoughts?

I'm not voting either -0 or +0 at this time.

Cheers,

Daniel


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-10 Thread Daniel Shahaf
Nathan Hartman wrote on Wed, Dec 07, 2022 at 20:29:11 -0500:
> On Wed, Dec 7, 2022 at 12:11 PM Evgeny Kotkov via dev <
> dev@subversion.apache.org> wrote:
> 
> >
> > I think that the `pristines-on-demand-on-mwf` branch is now ready for a
> > merge to trunk.  I could do that, assuming there are no objections.
> 
> 
> 
> I'd like to echo what others have already said by saying a great big THANK
> YOU, to all who have worked on this cool new feature so far!
> 
> I used an earlier incarnation of this branch some months ago in real usage
> scenarios with good results and looking at the recent commit emails as
> they've happened everything looks sensible to me.
> 
> I will try to run the full test suite in the next couple of days and
> assuming the tests pass for me I'll use it as my daily driver to test the
> real usage. Obviously I'll post here if I find anything...
> 
> Meanwhile I'd like to say that on further thought and after reading Johan's
> and Karl's feedback regarding the feature switch naming, I've come around
> to the point of view that --store-pristine={yes|no} is a perfectly fine UI.
> 

Well, if we're bikeshedding anyway, how about 
--backend-tweaks=without-pristines?
We can support just two values for starters ("without pristines" and
"with pristines"), and have the room to extend this in 1.16, similar to
--trust-server-cert/--trust-server-cert-failures and
--pre-1.4-compatible/--compatible-version.

Similarly, a new config file section with one valid option might make
sense if we anticipate adding more options to that section in the
future.  This way we avoid having the configuration split across two
places.

> Given that this is now the command line switch name, and since users are
> given direct control over the pristinefulness of a WC, and we've been
> calling this feature Pristines On Demand since its inception, I think we
> should finally bless this as the official name of the feature.
> 
> In the next couple of days I plan to update the staged 1.15 release notes,
> which until now tentatively called it Bare Working Copies, to call it
> Pristines On Demand and to complete the description there.
> 
> Regarding the SHA hash question:
> 
> While here, I would like to raise a topic of incorporating a switch from
> > SHA1 to a different checksum type (without known collisions) for the new
> > working copy format.  This topic is relevant to the pristines-on-demand
> > branch, because the new "is the file modified?" check relies on the
> > checksum
> > comparison, instead of comparing the contents of working and pristine
> > files.
> >
> > And so while I consider it to be out of the scope of the
> > pristines-on-demand
> > branch, I think that we might want to evaluate if this is something that
> > should be a part of the next release.
> 
> 
> Is it feasible and would it be beneficial to somehow decouple the hash code
> type from the wc format version? Asking because IIRC the need for a format
> bump to change hashes was one of the reasons it wasn't done a few years ago.

Maybe if we teach f32 to read /two/ new checksum kinds?  E.g., if we
teach f32 to read both SHA-512 and SHA-3, then even if 1.15 f32 writes
SHA-512 by default, it will nevertheless be able to read f32 wc's with
SHA-3 rows that 1.16 might create.

svn_checksum_kind_t's possible values include svn_checksum_fnv1a_32, so
I guess we already support reading wc.db's that use FNV-1a checksums?
(Incidentally, f31 is new in 1.8 whereas svn_checksum_fnv1a_32 is new
in 1.9.)

Cheers,

Daniel


Re: [BUG] svn tries to read a directory on a different filesystem and hangs

2022-11-11 Thread Daniel Shahaf
Daniel Shahaf wrote on Mon, Oct 31, 2022 at 10:02:14 +:
> Vincent Lefevre wrote on Mon, 24 Oct 2022 13:57 +00:00:
> > "svn" goes up in the directory hierarchy to look for a .svn directory.
> > The issue is that it doesn't stop at filesystem and/or owner change.
> 
> Why should the upwards scan stop at mount points?  Because accessing
> /home/.svn on a random machine in your lab hangs?  That's insufficient
> justification.

Because if the .svn directory were on a different mount point,
a subsequent «svn update» might attempt to atomically rename(2) something
from .svn/ into the ACTUAL tree, and fail because they're not on the
same device?

Example: If /h/home is a mountpoint and jrandom does 'svn up' in
/h/home/jrandom, then even if /h/.svn exists, atomic renames from
/h/.svn/tmp/foo to /h/home/jrandom/path/to/wc/bar wouldn't be possible.
Is this a good reason not to look for /h/.svn or /.svn at all (i.e., to
recurse upwards no further than to /h/home/.svn)?


Re: [BUG] svn tries to read a directory on a different filesystem and hangs

2022-10-31 Thread Daniel Shahaf
Vincent Lefevre wrote on Mon, 24 Oct 2022 13:57 +00:00:
> "svn" goes up in the directory hierarchy to look for a .svn directory.
> The issue is that it doesn't stop at filesystem and/or owner change.

Why should the upwards scan stop at mount points?  Because accessing
/home/.svn on a random machine in your lab hangs?  That's insufficient
justification.

Why should the upwards scan stop at owner change?  What's the facts of
the setup (a concrete example with relevant ownerships and permissions
specified) and what could Mallory do that he shouldn't be able to?  Feel
free to reply on security@ if the matter isn't suitable for public
discussion.

> This has several consequences:
>
> * A potential security issue, because some .svn directory may be
>   under control of another user.
>
> * On some machine at my lab (Debian/stable), this makes svn hang
>   when trying to open "/home/.svn", which is the home dir of the
>   user ".svn" (FYI, emacs tries to get the svn status of a file
>   when opening it).
>
> This is reproducible with
>
> svn, version 1.14.2 (r1899510)
>compiled Oct 20 2022, 08:12:24 on x86_64-pc-linux-gnu
>
> under Debian/unstable.
>
> On the Debian/stable machine, this issue is made worse by the fact
> that svn still goes up after a svn working copy has been reached:
>
> patate:~/private/backup> svn info
>
> hangs, but not
>
> patate:~/private> svn info
> svn: E155036: Please see the 'svn upgrade' command
> svn: E155036: The working copy at '/home/vlefevre/private'
> is too old (format 9) to work with client version '1.14.1 (r1886195)' 
> (expects format 31). You need to upgrade the working copy first.
>
> which fails immediately (this was probably a very old svn working copy,
> which I no longer use).

Not everyone uses Debian, so saying "the version of svn in Debian
stable" is farther right on the https://xkcd.com/1343/ scale than it
could be.  Distro version number and codename and package version
number is what I'd recommend.

Cheers,

Daniel

> -- 
> Vincent Lefèvre  - Web: 
> 100% accessible validated (X)HTML - Blog: 
> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: svn commit: r1902590 - /subversion/trunk/tools/client-side/store-plaintext-password.py

2022-07-15 Thread Daniel Shahaf
Nathan Hartman wrote on Thu, Jul 14, 2022 at 10:45:07 -0400:
> On Thu, Jul 14, 2022 at 10:02 AM Daniel Sahlberg
>  wrote:
> >
> > Den tors 14 juli 2022 kl 15:52 skrev Daniel Shahaf 
> > :
> >>
> >> Nathan Hartman wrote on Wed, 13 Jul 2022 15:29 +00:00:
> >> > On Wed, Jul 13, 2022 at 10:55 AM Daniel Shahaf  
> >> > wrote:
> >> >> Should the entry link to the zsh script
> >> >> (https://mail-archives.apache.org/mod_mbox/subversion-dev/202008.mbox/%3C20200816130713.6abca815%40tarpaulin.shahaf.local2%3E)
> >> >> as well, as an alternative?  It might be useful for someone if their
> >> >> environment doesn't have Python installed or if they find the zsh script
> >> >> easier to audit.
> >> >
> >> > I think it would be useful, and...
> >> >
> >> >> (Well, I suppose it might make more sense to copy the script
> >> >> somewhere than to link to an immutable archives message with that
> >> >> subject line.)
> >> >
> >> > ...the place to put it is probably tools/client-side/ just like the
> >> > Python script.
> >>
> >> Being in tools/ would imply dev@ accepts responsibility for bug reports
> >> against the zsh script.  Is dev@ happy to do that?  I'm concerned about
> >> the bus factor.
> >
> >
> > I was just about to say the same thing (and with no intention to
> > discredit the zsh version). If it is desirable to list all available
> > realms and let the user choose interactively, I could add that to
> > the Python script.

Adding such functionality would reduce the amount of legwork for users
(= would move the Python script leftwards on <https://xkcd.com/1343/>).

> > I was also going to add that I think it is better to provide one
> > tool and make sure that tool is working well instead of having two
> > tools that differ only in tiny details, since they might bit-rot in
> > different ways over time and it might be hard for a newcomer to
> > understand the motivation of having different tools.
> 

Agreed: knobs have a cost both to maintainers and to users.  However, we
should balance this downside with potential upsides, such as the ones I
offered above:

> >> >> [...]  It might be useful for someone if their environment
> >> >> doesn't have Python installed or if they find the zsh script
> >> >> easier to audit.

I'm not implying those points outweigh Daniel's; I'm just saying we
have identified pros and cons but haven't tallied them up yet.

For instance, perhaps we should link to both implementations but make it
clear that the Python one is preferred, community supported, "Use it
unless you know you need the other one", etc..

> 
> These are all good points.
> 
> I admit that zsh is a bit of a mystery to me, as is the script, so I
> couldn't provide support for it, at least not with my current
> knowledge. I am impressed that zsh can do so much with so little.
> 

zsh syntax can be terse, but the script is pretty much translatable
line-for-line into Python, except for the 'select' loop:

https://zsh.sourceforge.io/Doc/Release/Shell-Grammar.html#index-select

… which would be this:

def select(choices):
for i_and_element in enumerate(choices):
print("{}: {}".format(*i_and_element))
n = int(input("Choice number: "))
if not (0 <= n < len(choices)):
raise ...
return choices[n]

(plus a few more lines for the argv and loop support)

> It's in the list archives, but as DanielSh points out, is in a thread
> with a not-so-nice subject. That could be addressed by re-mailing it
> to dev@ with a new subject, e.g., "Prototype zsh script to store svn
> password in plaintext" in case anyone ever asks or searches for a
> non-Python way to do it. We could even link to it from the same FAQ,
> e.g., "An example of how to store svn plaintext credentials was
> implemented as a zsh script. It is unsupported by the SVN maintainers
> but can be found at [link] for pedagogical purposes."

If we give the script a new URL, perhaps we could make that URL identify
a _mutable_ resource, so if we ever have to update the script all its
users won't have to update their bookmarks?  Just a nice-to-have.

Cheers,

Daniel


Re: svn commit: r1902723 - /subversion/site/staging/docs/community-guide/releasing.part.html

2022-07-15 Thread Daniel Shahaf
dsahlb...@apache.org wrote on Thu, 14 Jul 2022 19:51 +00:00:
> +++ subversion/site/staging/docs/community-guide/releasing.part.html Thu Jul 
> 14 19:51:28 2022
> @@ -827,8 +827,7 @@ time pass.
> To run this script, you'll need a Subversion trunk working
> -copy (or a shallow trunk working copy containing the tools/dist and
> -build/generator directories).
> +copy.

How about keeping the parenthesized list and adding COMMITTERS to it?
This would make it easier to minimize dependencies on the trunk tree,
which would be a good thing (e.g., it would be harder to accidentally
use trunk's svn_version.h instead of a branch's).


Re: svn commit: r1902590 - /subversion/trunk/tools/client-side/store-plaintext-password.py

2022-07-14 Thread Daniel Shahaf
Nathan Hartman wrote on Wed, 13 Jul 2022 15:29 +00:00:
> On Wed, Jul 13, 2022 at 10:55 AM Daniel Shahaf  
> wrote:
>> Should the entry link to the zsh script
>> (https://mail-archives.apache.org/mod_mbox/subversion-dev/202008.mbox/%3C20200816130713.6abca815%40tarpaulin.shahaf.local2%3E)
>> as well, as an alternative?  It might be useful for someone if their
>> environment doesn't have Python installed or if they find the zsh script
>> easier to audit.
>
> I think it would be useful, and...
>
>> (Well, I suppose it might make more sense to copy the script
>> somewhere than to link to an immutable archives message with that
>> subject line.)
>
> ...the place to put it is probably tools/client-side/ just like the
> Python script.

Being in tools/ would imply dev@ accepts responsibility for bug reports
against the zsh script.  Is dev@ happy to do that?  I'm concerned about
the bus factor.

Cheers,

Daniel


Re: svn commit: r1902590 - /subversion/trunk/tools/client-side/store-plaintext-password.py

2022-07-13 Thread Daniel Shahaf
Daniel Shahaf wrote on Wed, 13 Jul 2022 14:54 +00:00:
> Nathan Hartman wrote on Wed, 13 Jul 2022 13:43 +00:00:
>> On Wed, Jul 13, 2022 at 9:33 AM Daniel Shahaf 
>> wrote:
>>
>>> dsahlb...@apache.org wrote on Fri, Jul 08, 2022 at 23:39:14 -:
>>> > A new script to store/update a password in the plain text password store
>>> >
>>> > * tools/client-side/store-plaintext-password.py
>>> >   As above
>>> >
>>> > Discussed on dev@:
>>> https://lists.apache.org/thread/jfd0f5n2qpgnyc30dst6ycnkphcwf6mm
>>> >
>>> > Added:
>>> > subversion/trunk/tools/client-side/store-plaintext-password.py
>>>  (with props)
>>>
>>> Presumably, now that it's been added, we should link it from somewhere
>>> to make it discoverable by users?
>>
>>
>>
>> Ah yes, it is on my todo list to link to it from the FAQ [1]. :-)
>>
>> [1] https://subversion.apache.org/faq.html#plaintext-passwords
>
> Added to staging in r1902704.  Hope you don't mind :)  Please take it
> from here if you have time.
>
> Should the entry link to the zsh script
> (https://mail-archives.apache.org/mod_mbox/subversion-dev/202008.mbox/%3C20200816130713.6abca815%40tarpaulin.shahaf.local2%3E)
> as well, as an alternative?  It might be useful for someone if their
> environment doesn't have Python installed or if they find the zsh script
> easier to audit.
>

Also, the zsh script offers the user to select a realm from a list,
whereas the python script asks the user to pass the realm in in argv[].
I.e., the zsh script may be easier to use.

Incidentally, Daniel, r1902590 needs s/real'/realm'/.

Cheers,

Daniel

> (Well, I suppose it might make more sense to copy the script
> somewhere than to link to an immutable archives message with that
> subject line.)
>
> Cheers,
>
> Daniel


Re: svn commit: r1902590 - /subversion/trunk/tools/client-side/store-plaintext-password.py

2022-07-13 Thread Daniel Shahaf
Nathan Hartman wrote on Wed, 13 Jul 2022 13:43 +00:00:
> On Wed, Jul 13, 2022 at 9:33 AM Daniel Shahaf 
> wrote:
>
>> dsahlb...@apache.org wrote on Fri, Jul 08, 2022 at 23:39:14 -:
>> > A new script to store/update a password in the plain text password store
>> >
>> > * tools/client-side/store-plaintext-password.py
>> >   As above
>> >
>> > Discussed on dev@:
>> https://lists.apache.org/thread/jfd0f5n2qpgnyc30dst6ycnkphcwf6mm
>> >
>> > Added:
>> > subversion/trunk/tools/client-side/store-plaintext-password.py
>>  (with props)
>>
>> Presumably, now that it's been added, we should link it from somewhere
>> to make it discoverable by users?
>
>
>
> Ah yes, it is on my todo list to link to it from the FAQ [1]. :-)
>
> [1] https://subversion.apache.org/faq.html#plaintext-passwords

Added to staging in r1902704.  Hope you don't mind :)  Please take it
from here if you have time.

Should the entry link to the zsh script
(https://mail-archives.apache.org/mod_mbox/subversion-dev/202008.mbox/%3C20200816130713.6abca815%40tarpaulin.shahaf.local2%3E)
as well, as an alternative?  It might be useful for someone if their
environment doesn't have Python installed or if they find the zsh script
easier to audit.

(Well, I suppose it might make more sense to copy the script
somewhere than to link to an immutable archives message with that
subject line.)

Cheers,

Daniel


Re: svn commit: r1902582 - /subversion/trunk/tools/dist/release.py

2022-07-13 Thread Daniel Shahaf
Daniel Sahlberg wrote on Fri, Jul 08, 2022 at 23:07:08 +0200:
> Den fre 8 juli 2022 kl 22:47 skrev :
> 
> > Author: dsahlberg
> > Date: Fri Jul  8 20:47:42 2022
> > New Revision: 1902582
> >
> > URL: http://svn.apache.org/viewvc?rev=1902582=rev
> > Log:
> > ASF no longer provide a aggregated KEYS file, so we need to construct it
> > ourselves using the make-keys.sh script.
> >
> > * tools/dist/release.py
> >   (roll_tarballs): Call make-keys.sh to create the KEYS file
> >   (get_keys): Call make-keys.sh to create the KEYS file
> >
> > Modified:
> > subversion/trunk/tools/dist/release.py
> >
> > Modified: subversion/trunk/tools/dist/release.py
> > URL:
> > http://svn.apache.org/viewvc/subversion/trunk/tools/dist/release.py?rev=1902582=1902581=1902582=diff
> >
> > ==
> > --- subversion/trunk/tools/dist/release.py (original)
> > +++ subversion/trunk/tools/dist/release.py Fri Jul  8 20:47:42 2022
> > @@ -98,7 +98,6 @@ dist_release_url = dist_repos + '/releas
> >  dist_archive_url = 'https://archive.apache.org/dist/subversion'
> >  buildbot_repos = os.getenv('SVN_RELEASE_BUILDBOT_REPOS',
> > '
> > https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster
> > ')
> > -KEYS = 'https://people.apache.org/keys/group/subversion.asc'
> >  extns = ['zip', 'tar.gz', 'tar.bz2']
> >
> >
> > @@ -980,7 +979,12 @@ def roll_tarballs(args):
> >  # from a committer's LDAP profile down the road)
> >  basename = 'subversion-%s.KEYS' % (str(args.version),)
> >  filepath = os.path.join(get_tempdir(args.base_dir), basename)
> > -download_file(KEYS, filepath, None)
> > +# The following code require release.py to be executed within a
> > +# complete wc, not a shallow wc as indicated in HACKING as one
> > option.
> > +# We /could/ download COMMITTERS from /trunk if it doesn't
> > exist...
> > +subprocess.check_call([os.path.dirname(__file__) +
> > '/make-keys.sh',
> > +   '-c', os.path.dirname(__file__) + '/../..',
> > +   '-o', filepath])
> >  shutil.move(filepath, get_target(args))
> >
> 
> I have tested the above part but NOT within the full roll_tarballs codepath
> since I'm not sure if I might cause changes in the repository. I believe
> the change is correct and I don't think things will be worse than trying to
> download a non-existing URL but I would appreciate the help from someone
> experienced in the release process to review or at least give me the
> confidence to roll a tarball locally.

IIRC, rolling the tarballs in itself just creates the foo.tar.gz files
locally; it doesn't create the tag or do the post-tagging housekeeping
commits.

To be sure it doesn't commit, you can invalidate or delete any caches of
your svn.apache.org password.  Or you could create another local user on
your OS and test from that.  The test user should have its own UID,
homedir, and environment, so it doesn't have access to your regular
user's cached usernames/passwords.


Re: svn commit: r1902582 - /subversion/trunk/tools/dist/release.py

2022-07-13 Thread Daniel Shahaf
dsahlb...@apache.org wrote on Fri, Jul 08, 2022 at 20:47:42 -:
> +++ subversion/trunk/tools/dist/release.py Fri Jul  8 20:47:42 2022
> @@ -980,7 +979,12 @@ def roll_tarballs(args):
>  # from a committer's LDAP profile down the road)
>  basename = 'subversion-%s.KEYS' % (str(args.version),)
>  filepath = os.path.join(get_tempdir(args.base_dir), basename)
> -download_file(KEYS, filepath, None)
> +# The following code require release.py to be executed within a
> +# complete wc, not a shallow wc as indicated in HACKING as one 
> option.
> +# We /could/ download COMMITTERS from /trunk if it doesn't exist...

Well, could you please either change HACKING or download COMMITTERS?
The code for the latter is basically the tempfile+urlopen mechanics from
the next hunk of this very diff.

> +subprocess.check_call([os.path.dirname(__file__) + '/make-keys.sh',
> +   '-c', os.path.dirname(__file__) + '/../..',
> +   '-o', filepath])
>  shutil.move(filepath, get_target(args))
>  
>  # And we're done!
> @@ -1465,12 +1469,11 @@ def check_sigs(args):
>  
>  def get_keys(args):
>  'Import the LDAP-based KEYS file to gpg'
> -# We use a tempfile because urlopen() objects don't have a .fileno()
> -with tempfile.SpooledTemporaryFile() as fd:
> -fd.write(urlopen(KEYS).read())
> -fd.flush()
> -fd.seek(0)
> -subprocess.check_call(['gpg', '--import'], stdin=fd)
> +with tempfile.NamedTemporaryFile(delete=False) as tmpfile:
> +  keyspath = tmpfile.name
> +subprocess.check_call([os.path.dirname(__file__) + '/make-keys.sh', 
> '-c', os.path.dirname(__file__) + '/../..', '-o', keyspath])
> +subprocess.check_call(['gpg', '--import', keyspath])
> +os.remove(keyspath)

That's not how one uses NamedTemporaryFile().

Generally, all uses of the file should be inside the «with» block, and
unlinking the file should be left to block's implicit handling
(tmpfile.__exit__()).

As written, however, NamedTemporaryFile() is used as though it were
a "generate a safe temporary name" API.  That means the file is not
created atomically and won't be cleaned up if subprocess.check_call()
raises an exception.

Could you rewrite so the file isn't used outside its «with» block?

>  def add_to_changes_dict(changes_dict, audience, section, change, revision):
>  # Normalize arguments
> 
> 


Re: svn commit: r1902590 - /subversion/trunk/tools/client-side/store-plaintext-password.py

2022-07-13 Thread Daniel Shahaf
dsahlb...@apache.org wrote on Fri, Jul 08, 2022 at 23:39:14 -:
> A new script to store/update a password in the plain text password store
> 
> * tools/client-side/store-plaintext-password.py
>   As above
> 
> Discussed on dev@: 
> https://lists.apache.org/thread/jfd0f5n2qpgnyc30dst6ycnkphcwf6mm
> 
> Added:
> subversion/trunk/tools/client-side/store-plaintext-password.py   (with 
> props)

Presumably, now that it's been added, we should link it from somewhere
to make it discoverable by users?

Cheers,

Daniel
(I have reviewed the changes you mentioned on dev@ and have no comments.)


Command-line tool for applying deltas? (was: Re: svnadmin: E16004: Invalid r4422 footer. How to investigate deeper?)

2022-06-28 Thread Daniel Shahaf
Good morning dev@,

Anyone has a script that takes as input a file and an svndiff and emits
to stdout the result of applying the latter to the former?  This came up
on users@ in the context of reconstructing a truncated rev file.

I've checked tools/.

Cheers,

Daniel


Daniel Shahaf wrote on Tue, 28 Jun 2022 17:58 +00:00:
> Assuming I haven't missed any simpler solution, you'll want:
⋮
> 2. A script that takes as input a file and a delta, applies the latter
> to the former, and outputs the result.  We don't seem to have one of
> those already.  If you write one, do consider contributing it for our
> tools/ directory.


Re: Subversion 1.10.0 end-of-life

2022-05-27 Thread Daniel Shahaf
Daniel Sahlberg wrote on Fri, 27 May 2022 10:40 +00:00:
> Den tors 26 maj 2022 kl 14:14 skrev Daniel Shahaf :
>> +0.5 to post to announce@ (and users@).  Might want to post the full
>> draft here (including subject, etc.) first, though?
>
> I propose to reuse the news article, adding the tail from the release
> announcements.
>
> [[[
> Subject: Apache Subversion 1.10.x end of life
>
> The Subversion 1.10.x line is end of life (EOL). It was released 2018-04-13
> and was supported for the last four years according to the LTS release
> life-cycle (see How we plan releases[1]). We recommend everyone to update
> to the current LTS release 1.14.2 as soon as practically possible since
> we've stopped accepting bug reports against 1.10.x and will not make any
> more 1.10.x releases. The last release (1.10.8) was made 2022-04-12 and is
> available to anyone who can't update to 1.14.
>

Looks good.

Nits:

- s/last release/last 1.10.x release/
- s/ (2018|2022)/ on \1/ [two lines affected]

Should we say something about 1.15?  (I can argue either way.)

Cheers,

Daniel

> Thanks,
> - The Subversion Team
>
> [1] https://subversion.apache.org/roadmap.html#release-planning
>
> --
> To unsubscribe, please see:
>
> https://subversion.apache.org/mailing-lists.html#unsubscribing
> ]]]
>
>
> /Daniel


Re: Subversion 1.10.0 end-of-life

2022-05-26 Thread Daniel Shahaf
Daniel Sahlberg wrote on Sun, 22 May 2022 21:07 +00:00:
> Den mån 9 maj 2022 kl 14:12 skrev Nathan Hartman :
>
>> On Mon, May 9, 2022 at 7:38 AM Daniel Sahlberg <
>> daniel.l.sahlb...@gmail.com> wrote:
>>
>>> Den sön 8 maj 2022 kl 02:21 skrev Daniel Shahaf :
>>>
>>>> Daniel Sahlberg wrote on Sat, 07 May 2022 18:37 +00:00:
>>>> > Den lör 7 maj 2022 kl 14:17 skrev Daniel Shahaf <
>>>> d...@daniel.shahaf.name>:
>>>> >
>>>> >> Daniel Sahlberg wrote on Sat, 07 May 2022 09:53 +00:00:
>>>> >> > I've committed the changes in r1900649.
>>>> >>
>>>> >> I wonder if this merits a news entry on /index.html?  Just "1.10.x is
>>>> >> EOL; please upgrade to 1.14".
>>>> >>
>>>> >
>>>> > Good point. I also considered this, but I couldn't find any other
>>>> > release being announced EOL so I elected to not do this. I'm open
>>>> > to reconsider!
>>>> >
>>>>
>>>> Until today, most releases that have gone EOL did so either by virtue of
>>>> a subsequent .0 release being made (1.0 through 1.8) or at about the
>>>> same time as a subsequent .0 release being made (1.11 through 1.13
>>>> inclusive).  In either case, at about the time of a release's going EOL
>>>> there would have been a news entry (and announce@ post, and possibly
>>>> a press release) about the new release, and the new release's release
>>>> notes would have pointed out, at the very end, that previous releases
>>>> were EOL'ed by the new release. So, to someone who knew our "support two
>>>> release lines" policy, EOLings were very visible.
>>>>
>>>
>>> Good point. I saw this in the release notes but I can't find anything in
>>> announce@. Is the new release policy something we want to announce?
>>>
>>> I've added a news item in 1900735 et al.
>>>
>>
>>
>> I think we should announce it.
>>
>> The proposed news item looks good.
>>
>> One thing I might change is to suggest updating "as soon as practical," or
>> something to that effect, and point out that 1.10.8, released last month,
>> is available as a final 1.10 release.
>>
>
> Thanks, sounds like a good idea. I've added this in r1901130.
>
> Can I just copy the news item text and post to announce@

+0.5 to post to announce@ (and users@).  Might want to post the full
draft here (including subject, etc.) first, though?

> (I suppose I will have to configure my @apache.org address as sender).

Yes.  It's mail-relay.apache.org:587.

Cheers,

Daniel


Re: svn commit: r1900883 - /subversion/branches/1.14.x/STATUS

2022-05-15 Thread Daniel Shahaf
Nathan Hartman wrote on Sun, May 15, 2022 at 03:36:05 -0400:
> On Sat, May 14, 2022 at 8:57 AM  wrote:
> > +++ subversion/branches/1.14.x/STATUS Sat May 14 12:57:32 2022
> > @@ -39,6 +39,15 @@ Candidate changes:
> > votes:
> >   +1: rhuijben
> >
> > + * r1900882
> > +   Replace a call to a function deprecated upstream.
> > +   Justification:
> > + No-op on Python 3.2 and newer, but will allow Subversion 1.14.x to be
> > + built by future Python 3.12.  Note that Python 3.2 was released 9 
> > years
> > + before Subversion 1.14.0.
> > +   Votes:
> > + +1: danielsh
> > +
> 
> Should futatuki's follow-up in r1900890 be added to this nomination?

Per ,
breaking py2 support in a 1.14.x patch release is permitted.  Thus,
we are not required to add r1900890 to this nomination.

If we do nominate r1900890, here's my +0 for it.  (Not +1 because
I haven't audited all callsites that «import gen_base».)

> Also should this reference SVN-4899?

r1900922.


Re: svn commit: r1900649 - in /subversion/site/publish: ./ docs/community-guide/releasing.part.html roadmap.html

2022-05-14 Thread Daniel Shahaf
dsahlb...@apache.org wrote on Sat, 07 May 2022 09:52 +00:00:
> Merge from site/staging: 1900404, 1900405, 1900528, 1900532, 1900561, 1900562
>
> Document the revised release policy as discussed on dev@ [1].
>
> * publish/docs/community-guide/releasing.part.html,
>   publish/roadmap.html:
>   Changes in several sections related to release process, see merged
>   revisions for details.
>
> [1] https://lists.apache.org/thread/17v36gol5vltyx3pv9z4wskftq7hn4zb

Both docs/release-notes/index.html and the download page have
noticebox talking about a "6-month regular and 2-year LTS release
schedule".  The constant is wrong (need s/2/4/), as well as the term
"schedule", I believe,

(I found these by grepping for links to roadmap.html#release-planning.)

Cheers,

Daniel


Re: Subversion 1.10.0 end-of-life

2022-05-07 Thread Daniel Shahaf
Daniel Sahlberg wrote on Sat, 07 May 2022 18:37 +00:00:
> Den lör 7 maj 2022 kl 14:17 skrev Daniel Shahaf :
>
>> Daniel Sahlberg wrote on Sat, 07 May 2022 09:53 +00:00:
>> > I've committed the changes in r1900649.
>>
>> I wonder if this merits a news entry on /index.html?  Just "1.10.x is
>> EOL; please upgrade to 1.14".
>>
>
> Good point. I also considered this, but I couldn't find any other release
> being announced EOL so I elected to not do this. I'm open to reconsider!
>

Until today, most releases that have gone EOL did so either by virtue of
a subsequent .0 release being made (1.0 through 1.8) or at about the
same time as a subsequent .0 release being made (1.11 through 1.13
inclusive).  In either case, at about the time of a release's going EOL
there would have been a news entry (and announce@ post, and possibly
a press release) about the new release, and the new release's release
notes would have pointed out, at the very end, that previous releases
were EOL'ed by the new release. So, to someone who knew our "support two
release lines" policy, EOLings were very visible.

As to 1.9, I don't think we made a conscious decision _not_ to make
a news entry pointing out that 1.9 went EOL, either.

> There will be a bunch of further changes, for example the download page.
> I'll commit to staging first and encourage review (it is getting late
> here...) and will merge to publish later.

Thanks for doing the legwork :)

Daniel


Re: Subversion 1.10.0 end-of-life

2022-05-07 Thread Daniel Shahaf
Daniel Sahlberg wrote on Sat, 07 May 2022 09:53 +00:00:
> I've committed the changes in r1900649.

I wonder if this merits a news entry on /index.html?  Just "1.10.x is
EOL; please upgrade to 1.14".


Re: svn commit: r1900404 - in /subversion/site/staging: docs/community-guide/releasing.part.html roadmap.html

2022-05-02 Thread Daniel Shahaf
Daniel Sahlberg wrote on Mon, 02 May 2022 20:12 +00:00:
> Thanks to everyone for discussing this and moving it forward! I'm sorry I
> wasn't able to be more active last week but life got in the way.
>
> One small point below...
>
> Den lör 30 apr. 2022 kl 00:04 skrev :
> [...]
>
>> +LTS releases are supported for four years from the date of
>> their
>> +initial release.  For instance, 1.15.x will supported until four years
>> after
>> +the announcement of 1.15.0.
>>
>
> Should we really declare 1.15 an LTS release at this stage?

No.  Deciding whether 1.15 should be LTS or Regular deserves a thread of
its own.  As far as this thread is concerned, the documentation should
reflect the status quo: that it has not been decided yet whether 1.15
will be LTS or Regular.

Good catch.

If someone could please update the text staging/ that would be great.

> I would also suggest to remove the "Transition to LTS and Regular 
> Releases"
> section (
> https://subversion-staging.apache.org/roadmap.html#transition-lts-regular-releases)
> since it seems to concern the fixed-time release schedule. I can do 
> this,
> just wanting to check that I don't missread something.

The description of what we backport is "general backports and thereafter
high priority fixes" in this section, and "high priority issues such as
… and sometimes also other issues" in the section above.  We might want
to clarify the "other issues" part of the latter sentence when we delete
this section.

Also, might want to explicitly spell out that 1.10 is now EOL: someone
might think that 1.10 would be supported with security fixes until the
LTS _after 1.14_ is released, as that would have been the case under our
pre-1.11 policy if there hadn't been Regular releases at all.

Also, to answer your question in the OP, we'll want to remove 1.10 from
the download page and from dist/release/.

Cheers,

Daniel


Re: Subversion 1.10.0 end-of-life

2022-04-28 Thread Daniel Shahaf
Nathan Hartman wrote on Thu, Apr 28, 2022 at 15:25:55 -0400:
> if we start releasing more frequent LTS .0 versions, we would end up
> promising to support too many lines simultaneously. I hadn't
> considered that because I was working from the assumption that we
> aren't releasing new lines frequently enough for that to be a problem,
> but we'd better document that in HACKING so the dev community doesn't
> forget that in the future and create that situation. Maybe it'll be an
> explanation about when to call a release LTS vs Regular. E.g., if
> there exist two supported release lines then any further releases
> should be Regular until the older LTS drops out.

While there, HACKING should tell the RM to record the "Is it LTS or
regular?" decision into release-lines.yaml:lts_release_lines.


Re: Subversion 1.10.0 end-of-life

2022-04-28 Thread Daniel Shahaf
Nathan Hartman wrote on Thu, Apr 28, 2022 at 15:25:55 -0400:
> the explanation about support periods should be easy to understand.

Index: staging/roadmap.html
===
--- staging/roadmap.html(revision 1900368)
+++ staging/roadmap.html(working copy)
@@ -86,41 +86,46 @@
 title="Link to this section">
 
 
-Subversion plans to make a regular release every 6 months,
-   with a Long-Term Support (LTS) release every 2 years.
-   Regular releases are intended to deliver new features more quickly, while
-   LTS releases are intended to provide stability over longer periods.
+Subversion has two types of releases:
+   regular releases are intended to deliver new features more 
quickly, while
+   LTS releases are 
intended to provide stability over longer periods.
 
 
-
-  
-type of release
-emphasis
-release every
-support period
-release numbers
-  
-  
-LTS release
-stability
-2 years
-4 years
-1.10, 1.14, ...
-  
-  
-regular release
-features
-6 months
-6 months
-1.11, 1.12, 1.13, ...
-  
-
+The two types releases differ in their support lifetime:
 
+
+
+Regular releases are supported for six months from the date of
+their initial release.  For instance, 1.11.x was supported until six months
+after the announcement of 1.11.0.
+
+LTS releases are supported for four years from the date of their
+initial release.  For instance, 1.15.x will supported until four years after
+the announcement of 1.15.0.
+
+LTS releases are supported until three months after the release of
+the the next LTS.
+
+The previous two guarantees cumulate: for an LTS release line to be declared
+end-of-life (EOL), it has to both have been first released over four
+years before and have been supported in parallel to a newer LTS
+release line for at least three months.
+
+For instance, assume 1.42.0 is released on 2042-07-01 and 1.42 is declared
+an LTS line.  In this case, 1.42 will be supported at least until 2046-06-30
+(with no ifs, buts, or maybes).  Furthermore, it is expected that a newer LTS
+release (1.43.0, 1.44.0, etc.) will be made before 2046-04-01, leaving three
+months for upgrading installations.  In case no newer LTS release is made
+until, say, 2048-01-01, the lifetime of 1.42 will automatically be extended
+until 2048-03-31.
+
+At any given time there will be at least one supported LTS release.
+
+
+
 During the support period, we commit to providing updates that fix high
 priority issues such as security and data loss or corruption. We may also
-sometimes fix other issues as appropriate to the emphasis of each release.
-If a release takes longer than planned, we will extend the support periods
-of the previous releases accordingly.
+sometimes fix other issues as appropriate to the emphasis of each release.
 
 In this context, "release" means an increment of the minor release
 number, which is the middle number in our three-component system.
@@ -131,6 +136,9 @@
 bugfixes have accumulated to warrant it.  Major new releases, such as
 Subversion 2.0, will probably be done much like the minor releases,
 just with more planning around the exact features.
+
+To date, every release since 1.0 has been LTS, with the exception of 1.11,
+1.12, and 1.13 which were regular.
 
 For more information about Subversion's release numbering and
 compatibility policies, see the section entitled


Re: Subversion 1.10.0 end-of-life

2022-04-28 Thread Daniel Shahaf
Stefan Sperling wrote on Thu, 28 Apr 2022 09:55 +00:00:
> I think it would be better to have such details spelled out in English
> in a manner that is easy to understand for anyone, with illustrating
> examples, instead of (or in addition to) mathematical notation that
> requires abstract thinking to figure out.

I'm unable to interpret this charitably.

More below.

> On Wed, Apr 27, 2022 at 11:43:02PM -0400, Nathan Hartman wrote:
>> On Wed, Apr 27, 2022 at 4:56 PM Daniel Sahlberg
>>  wrote:
>> >
>> > Den ons 27 apr. 2022 kl 21:02 skrev Daniel Shahaf 
>> > :
>> >>
>> >> As to the general rule, I think we're missing a piece: the overlap
>> >> period.  We should say something along the lines of "Every LTS release
>> >> will be supported for at least Y years, or until M months after the
>> >> release of another LTS .0, whichever comes later.".
>> >
>> >
>> > +1 to have an overlap period.
>> >
>> > Y = 4, M = 3? Or M = 6?
>> 
>> I'm also +1 to have an overlap period, and the idea of "at least Y
>> years, or until M months after the release of another LTS .0,
>> whichever comes later" seems quite reasonable to me.
>
> Shouldn't this say "whichever comes earlier"?

No.  That would make Y meaningless: once 1.15.0 is release, admins won't
be able to rely on any "Y years" promise since a 1.16.0 might be
released sooner than M months before the Y years are up.

That's basically the equivalent of telling someone "I'll babysit your kids on
2026-04-27 unless it's a sunny day".  A promise with strings attached isn't.

> Otherwise, the M months rule would never apply in case we release more
> than one LTS line within Y years, right?

Almost.  It would never apply if we release two LTS .0's within
$Y years - M months$ of each other.

> Would we then end up fully supporting several lines of LTS releases?

Yes.

> Example with Y=4:
> release 1.15.0 in year 1 (support 1.15)
> release 1.16.0 in year 2 (support 1.15, 1.16)
> release 1.17.0 in year 3 (support 1.15, 1.16, 1.17)
> release 1.18.0 in year 4 (support 1.15, 1.16, 1.17, 1.18)
> release 1.19.0 in year 5 (support 1.16, 1.17, 1.18, 1.19)
> ...

Books whose plots involve annual Apache Subversion releases go in the
sci-fi section ;-)

And if it becomes reality, we could make 1.16 a Regular release (as
Nathan pointed out), or apply the "experimental" descriptor to new
features more liberally.

Daniel


Re: Subversion 1.10.0 end-of-life

2022-04-27 Thread Daniel Shahaf
Nathan Hartman wrote on Tue, 26 Apr 2022 13:58 +00:00:
> On Tue, Apr 26, 2022 at 5:57 AM Stefan Sperling  wrote:
>>
>> On Mon, Apr 25, 2022 at 10:05:58PM +0200, Daniel Sahlberg wrote:
>> > Hi,
>> >
>> > According to the Roadmap, How we plan releases[1], 1.10.0 is a LTS release
>> > that will receive support for 4 years. According to the News archive[2],
>> > 1.10.0 was released 2018-04-13.
>> >
>> > 1.10.0 was released approximately two months before the transition to the
>> > LTS support policy and I have not been able to dig out what was promised
>> > previously.
>>
>> Before LTS releases, the 2 most recent lines of releases were supported.
>> The most recent one would receive all types of bug fixes, the second one
>> would receive security or data-corruption fixes only.
⋮
>> Going back to the old policy would mean that 1.10 would be supported
>> until 1.15 comes out. At which point only 1.15 and 1.14 would be supported.
>
>
> The older policy seems simpler.
>
> One good takeaway from the new policy is that we give the expected
> lifespan of a release line to help admins plan server upgrades. It
> also helps us with our own planning. [...] For reasons like this, declaring
> the expected lifespan as we've been doing since the new policy is
> helpful.

+1

> We could keep the LTS vs Regular distinction because it standardizes
> on two possible lifespans, either 4 years or 6 months, BUT stop
> promising to make Regular releases, neither at 6 months nor any other
> frequency. It becomes optional to make a Regular release if and when
> it makes sense and the developers and community are willing and able
> make it happen.

+1

> Back to Daniel's question: Since we aren't currently making a new LTS
> release every 2 years, I think it makes sense to go with what Stefan
> suggests, meaning EOL 1.10.x when 1.15.0 is released.
>

We actually already document (in
https://subversion.apache.org/roadmap.html#release-planning, which is
linked from the download page) that 1.9 and 1.10 will be LTS with the
four-year lifetime.  Assuming we wrote that part _before_ we released
1.10.0, that means 1.10 is EOL now and we can mark it as such.

> At this time, we might warn that 1.10.x is "deprecated" (or something
> along those lines) by which I mean to warn users that 1.10.x will be
> EOL when the next minor release is made and encourage upgrading to
> 1.14.x as soon as reasonable. This means we could still make 1.10.x
> patch releases if it makes sense to do so, but admins should not
> count on that support continuing for any definable length of time.
>
> I know that seems kind of ad-hoc. It would be very helpful if we could
> as a community decide this policy question and then update the site.

As to the general rule, I think we're missing a piece: the overlap
period.  We should say something along the lines of "Every LTS release
will be supported for at least Y years, or until M months after the
release of another LTS .0, whichever comes later.".

But the general principle of having only one supported LTS at this point
in time, since 1.10 has EOL'd, seems sound to me.  We have fewer hands
on deck nowadays, so we should try to support fewer lines.

Cheers,

Daniel


Re: svn commit: r1899945 - /subversion/trunk/subversion/tests/cmdline/svntest/__init__.py

2022-04-22 Thread Daniel Shahaf
Daniel Sahlberg wrote on Wed, Apr 20, 2022 at 13:20:15 +0200:
> Den ons 20 apr. 2022 kl 05:57 skrev Branko Čibej :
> 
> > On 18.04.2022 19:46, Nathan Hartman wrote:
> > > On Sun, Apr 17, 2022 at 9:30 AM  wrote:
> > >> Author: danielsh
> > >> Date: Sun Apr 17 13:30:40 2022
> > >> New Revision: 1899945
> > >>
> > >> URL: http://svn.apache.org/viewvc?rev=1899945=rev
> > >> Log:
> > >> * subversion/tests/cmdline/__init__.py
> > >>(): Rewrite a comment.
> > >>
> > >> Modified:
> > >>  subversion/trunk/subversion/tests/cmdline/svntest/__init__.py
> > >>
> > >> Modified: subversion/trunk/subversion/tests/cmdline/svntest/__init__.py
> > >> URL:
> > http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/svntest/__init__.py?rev=1899945=1899944=1899945=diff
> > >>
> > ==
> > >> --- subversion/trunk/subversion/tests/cmdline/svntest/__init__.py
> > (original)
> > >> +++ subversion/trunk/subversion/tests/cmdline/svntest/__init__.py Sun
> > Apr 17 13:30:40 2022
> > >> @@ -18,8 +18,6 @@
> > >>   # under the License.
> > >>   #
> > >>
> > >> -# any bozos that do "from svntest import *" should die. export nothing
> > >> -# to the dumbasses.
> > >>   __all__ = [ ]
> > >>
> > >>   import sys
> > >>
> > >>
> > >
> > > This removes the comment, rather than rewriting it as suggested in the 
> > > log.
> > >

Sorry that wasn't clear.  It does, however, explain how I approached
that comment: I hadn't _set out_ to delete it, but to rewrite it with
the content preserved and the rest removed.  The result of the rewrite
was the empty string.

> > > I agree the comment should be rewritten. It was added (along with the
> > > __all__ = []) in r951379, the log of which reads: "Protect against bad
> > > python proggies." I couldn't find other contextual information about
> > > it, but I suppose any comment to that effect would be helpful?
> >
> >
> > 'from X import *' is considered bad practice in Python, for various
> > reasons.
> >
> > I'm not sure why the comment had to be removed, unless it's a case of
> > overly sensitive political correctness.
> >
> 
> I disagree with "overly sensitive". There are other ways to convey the same
> message that are probably equally effective in educating the reader. I
> wouldn't approve of that kind of language at $dayjob and I don't think it
> belongs here either.
> 
> What about:
> 
> [[[
> # from X import * is bad practice in Python. export nothing to those using
> it.
> ]]]

I don't think that's what the comment should say.

The target audience of comments is someone who is bilingual in Python
and English.  Therefore, comments shouldn't simply be a _translation_ of
the code from Python to English; rather, the English should say
something _over and above_ what has already been said in Python.

"Export nothing to callsites that star-import this module" is precisely
what «__all__ = [ ]» means.  In the context of a Python file, it's
assumed knowledge, much like the meaning of #! on the first line of
a Python file or an include guard in a *.h file.

So, we don't need a comment that translates the assignment from Python
to English for the reader's benefit.  What we _might_ need is a comment
that explains _why_ we set __all__ to an empty sequence.  I say "might",
because if assigning «__all__ = [ ]» is standard practice, then it can
go without any comment at all, just like include guards don't get
comments; but on the other hand, if assigning «__all__ = [ ]» is _not_
standard practice, then we do need to explain why we do that (but we can
assume the reader knows what "that" is).

Makes sense?

Daniel

P.S.  «__all__» is documented at .


Re: Pristines-on-demand: printing progress notifications

2022-04-17 Thread Daniel Shahaf
Karl Fogel wrote on Wed, Mar 30, 2022 at 17:58:55 -0500:
> On 30 Mar 2022, Julian Foad wrote:
> > Karl Fogel wrote:
> > > I think printing these messages to stderr makes the most sense.
> > > There are plenty of programs out there that parse the stdout of
> > > 'svn'; we don't want to interfere with them.
> > > 
> > > As you point out, it's especially important for 'svn diff' and 'svn
> > > cat' that stdout remain untainted.  Therefore, we can either always
> > > print these messages to stderr (across all commands), or not print
> > > them for 'svn diff' and 'svn cat' (but that would be an odd
> > > inconsistency).
> > > 
> > > > Anybody want to recommend what we should do for 'cat' and
> > > > 'diff'?
> > > 
> > > As per above: I think we should print the messages to stderr for
> > > everything, including those two.
> > 
> > Printing progress notifications for data-output commands (diff, cat) to
> > stderr does however invite bikeshedding. Currently in our test suite we
> > assume any stderr output (from diff, cat) should be flagged as a test
> > failure. We can change that, but it indicates that some users may
> > consider it a failure too. We would need to agree and decide whether
> > that's going to be unconditional or if the user needs to be able to turn
> > it off for convenience and for backward compatibility.
> > 
> > Because this could be dragged out I'm filing it as a lower priority for
> > now. We can get back to it. (If someone feels able to resolve it,
> > great.)
> 
> Good points, and +1 to prioritizing it lower down relative to shipping the
> main thing!

I'm a bit hesitant about disabling notifications _entirely_ in cat-cmd.c
and diff-cmd.c.  Disabling all notifications (as opposed to only
hydration-related notifications which we focus on right now) seems like
it could easily have unintended consequences.  Do we do that elsewhere?
E.g., in --xml mode?

«diff» isn't unique in having parseable output.  On the contrary, all
our subcommands have parseable output, whether it's "unidiffs" or "seven
columns, then a column of spaces, then filenames" or "XML in 
schema" or "RFC822-esque data with localized field names".  Sure, «svn
diff»'s output may be consumed by other tools, but that also goes for
other subcommands (e.g., help/info/ls/proplist/status all have
machine-parseable output even in non-«--xml» mode).  So, I am leaning to
the opinion that «svn diff» isn't a special case, and «diff» and those
other commands should all use the same rules for their output.

What are those rules?  Good question.  stdout isn't a good place for the
hydration notifications, since those are of interest to the (human)
producer of the output but not to the consumer of the output; stderr
isn't a good place, since CLI consumers may interpret that as an error
indication (though the exit code would be zero, EXIT_SUCCESS); and
/dev/null isn't a good place either, since we added those notifications
for a reason.

«svn cat», on the other hand, _is_ unique, in that it's the only command
whose output format is not controlled by us.  That means we can't emit
anything to stdout in «svn cat» without possibly mangling user data.
So, for «svn cat» we don't have the option of emitting notifications to
stdout; we have only the other options discussed above.

Cheers,

Daniel


Re: Pristines-on-demand: authz denied during textbase sync (#4888)

2022-04-17 Thread Daniel Shahaf
Julian Foad wrote on Wed, Apr 06, 2022 at 13:07:43 +0100:
> > Filed as issue #4888, https://subversion.apache.org/issue/4888
> 
> I have just been looking back over this issue. Clearly there is more to
> it than a quick fix. Summary, based on reviewing the email thread:
> 
> - FAIL: authz_tests.py 31 remove_access_after_commit
> 
> - Patched in : in "text base sync"
> phase, ignore auth error while fetching any text base; continue with
> trying to fetch the rest.
> 
> - While that change fixes that particular test case, it also seems to
> have regressed the failure mode of «svn cat iota@BASE» when iota is
> locally-modified and has no read access. On trunk that command displayed
> the base text; on this branch that command now errors out.
> 
> - We started thinking about what other failure modes could now occur
> because of failures (including unauthorised, redirect, and others) at
> the hydrate phase. This is somewhat open ended; we don't have a simple
> answer to how this all should be taken care of consistently.
> 
> I suggest:
> 
> - revert the patch I applied, as it's papering over the problem in an
> incomplete way and so possibly causes more confusion than it fixes.
> 
> - leave this issue open and come back to it later; it's an edge case not
> part of common work flows.

Shouldn't SVN-4888 be marked a blocker, though?  At least until we are
satisfied that the code as it stands is a solid base to implement an
answer to the "open ended" questions on at a later date?

It's also a regression, but as you say it's also an edge case, and these
two aspects may cancel each other out.

Cheers,

Daniel


Re: Pristines-on-demand: OK to merge to trunk?

2022-04-17 Thread Daniel Shahaf
Julian Foad wrote on Thu, Apr 07, 2022 at 12:43:03 +0100:
> TL;DR: are we OK to merge the pristines feature
> ('pristines-on-demand-on-mwf' branch) to trunk soon, like early next week?
> 
> As said in "A status review" [1] in the long thread "A two-part vision
> for Subversion and large binary objects.", next steps are reviewing and
> handling the outstanding issues, and proposing merge to trunk. I think
> these can be done in parallel as I don't see any that would block a
> merge to trunk. So here is the proposal to merge to trunk, and then
> complete the remaining work on trunk.
> 
> It feels to me like there is general consensus that this feature is
> taking a form that will be acceptable for a first release of it (while
> not perfect), and consensus for proceeding to get it into trunk and
> subsequently including it in the next release. I'm too close to it to
> make an independent assessment. Can anybody else comment?
> 

Although I've done some work on the branch, and did at points diff to
trunk for a specific thing I was working on at the time, at no point did
I do a complete start-to-finish review, as would be needed before
a merge.  So, please do *not* count me as an implicit +1.

Also, I'd be wary of merging the branch to trunk so long as there are
blockers, unless whoever does the merge is certain they will have
sufficient (round) tuits to fix those blockers in a timely manner.

Cheers,

Daniel

> If no objections, I plan to merge to trunk early next week.
> 
> [1] on dev@, 2022-04-05, 
> https://lists.apache.org/thread/lm98og8jqonffcs250q5y3ft5r5qlmk5
> 


Re: svn commit: r1899276 - /subversion/site/publish/upcoming.part.html

2022-04-10 Thread Daniel Shahaf
Daniel Sahlberg wrote on Mon, Mar 28, 2022 at 21:55:36 +0200:
> Den mån 28 mars 2022 kl 09:55 skrev Daniel Sahlberg <
> daniel.l.sahlb...@gmail.com>:
> 
> > This commit doesn't look correct.
> >
> > I executed the generate-upcoming-changes-log.sh manually yesterday and it
> > created r1899244, removing a lot of log entries belonging to 1.14.1. This
> > commit (which was executed by cron) restores them.
> >
> > The crontab entry is:
> > [[[
> > # Puppet Name: Update our upcoming changes list
> > SVN=svn
> > 15 4 * * * chronic ~/src/svn/site/tools/generate-upcoming-changes-log.sh
> > ]]]
> >
> > The script begins with a comment
> > [[[
> > # This should be run from the root of a branches/1.{9,10,11}.x working
> > copy.
> > ]]]
> >
> > I suppose the crontab command should be changed to:
> > cd ~/src/svn/1.14.x && chronic
> > ~/src/svn/site/tools/generate-upcoming-changes-log.sh
> >
> 
> So I've investigated this further and my initial analysis seems correct.
> 
> This is what happens:
> * generate-upcoming-changes-log.sh determines last patch release number and
> generates upcoming.part.html based on all commits since that patch release
> was tagged.
> * It has two different ways to determine "the last patch release":
>* If `cwd` is a WC it look in the subversion/include/svn_version.h
>* Else look in https://dist.apache.org/repos/dist/release/subversion/
> * The logic looking at dists.a.o selected the oldest patch release
> available on dists.a.o, in case there are more than one.
> * The crontab entry executed generate-upcoming-changes-log.sh from ~, thus
> forcing a lookup from dists.a.o
> 
> Currently, both 1.14.0 and 1.14.1 are available on dists.a.o. Thus it
> emitted all merged changes since 1.14.0, while expected behaviour (of
> upcoming.part.html) would be to only display merged changes since 1.14.1.
> 

dist/ should only contain the latest release from each supported minor
line, except when a release is being staged.  I.e., today it should
contain 1.14.2 and 1.14.1 since we're in the process of staging 1.14.2;
but by Friday it should contain 1.14.2 and 1.10.8 and only those two.

The script relies on this.  Thus, if we'd deleted 1.14.0's artifacts
when we released 1.14.1 and kept the cron job as it was, the output
would then have been correct (and we wouldn't have had an extra manual
step in our create-a-A.B.x-branch workflow).

> I've pushed a change in puppet to the crontab entry as suggested above.
> This should solve the problem for now. When branching 1.15.x, we need to
> update this crontab entry.

Then please update 

accordingly :)

Cheers,

Daniel


Re: Question on release announcement mail

2022-04-10 Thread Daniel Shahaf
Mark Phippard wrote on Sun, Apr 10, 2022 at 16:02:07 -0400:
> On Sun, Apr 10, 2022 at 3:27 PM Daniel Shahaf  wrote:
> >
> > Mark Phippard wrote on Sun, Apr 10, 2022 at 15:16:58 -0400:
> > > So I was wondering how, using the gpg command. I can get the other
> > > elements we include .. such as: Stefan Sperling
> > > [2048R/4F7DBAA99A59B973]
> >
> > They're generated by release.py:get_siginfo() which is called by
> > write_announcement(), so, «release.py write-announcement» is the right
> > answer.  (I just grepped for "with fingerprint:".)
> >
> > > A problem I am having is with my key. I have to run the
> > > write-announcement in my Docker image but that has an old version of
> > > GPG that does not know what to do with my key.
> >
> > Install gpg from backports, or run write-announcement elsewhere?
> > I don't see why you couldn't run it anywhere you have a wc of
> > /dist/release.
> 
> Even on a system with a GnuPG that understands my key the Python
> script does not:
> 
> Traceback (most recent call last):
>   File "/Users/markphip/projects/svn-trunk/tools/dist/release.py",
> line 1917, in 
> main()
>   File "/Users/markphip/projects/svn-trunk/tools/dist/release.py",
> line 1913, in main
> args.func(args)
>   File "/Users/markphip/projects/svn-trunk/tools/dist/release.py",
> line 1272, in write_announcement
> siginfo = get_siginfo(args, True)
>   File "/Users/markphip/projects/svn-trunk/tools/dist/release.py",
> line 1421, in get_siginfo
> formatter = PUBLIC_KEY_ALGORITHMS[keytype]
> KeyError: 22
> 
> 
> So I was going to remove my key from the signature file, run the
> script to get the email announcement, and then put my key back. But
> then I was looking for how I could manually construct what my entry
> should look like in the email.
> 

Perhaps something like this:

Index: release.py
===
--- release.py  (revision 1899017)
+++ release.py  (working copy)
@@ -1417,7 +1402,7 @@ def get_siginfo(args, quiet=False):
 if parts[0] == 'pub':
 keylen = int(parts[2])
 keytype = int(parts[3])
-formatter = PUBLIC_KEY_ALGORITHMS[keytype]
+formatter = PUBLIC_KEY_ALGORITHMS.get(keytype, lambda keylen: 
"?".format(keytype, keylen))
 long_key_id = parts[4]
 length_and_type = formatter(keylen) + '/' + long_key_id
 del keylen, keytype, formatter, long_key_id

Or this:

Index: release.py
===
--- release.py  (revision 1899017)
+++ release.py  (working copy)
@@ -1326,6 +1311,7 @@ PUBLIC_KEY_ALGORITHMS = {
 # The values are callables that produce gpg1-like key length and type
 # indications, e.g., "4096R" for a 4096-bit RSA key.
 1: (lambda keylen: str(keylen) + 'R'), # RSA
+22: (lambda keylen: "ed25519"), # according to gpg2; this value is not in 
the IANA registry above
 }
 
 def _make_human_readable_fingerprint(fingerprint):

> I could just leave my signature out of the release too so as not to
> have downstream users need to deal with this problem.

Please don't.  Anyone with an OpenPGP implementation who doesn't know
what public key algorithm 22 is should be able to ignore your signature
and only verify the others.

I suppose you could move your own signature to be last in the files, but
even for this you might want to wait until someone actually complains
about the files failing to verify.


Re: Question on release announcement mail

2022-04-10 Thread Daniel Shahaf
Mark Phippard wrote on Sun, Apr 10, 2022 at 15:16:58 -0400:
> So I was wondering how, using the gpg command. I can get the other
> elements we include .. such as: Stefan Sperling
> [2048R/4F7DBAA99A59B973]

They're generated by release.py:get_siginfo() which is called by
write_announcement(), so, «release.py write-announcement» is the right
answer.  (I just grepped for "with fingerprint:".)

> A problem I am having is with my key. I have to run the
> write-announcement in my Docker image but that has an old version of
> GPG that does not know what to do with my key.

Install gpg from backports, or run write-announcement elsewhere?
I don't see why you couldn't run it anywhere you have a wc of
/dist/release.


Re: Question on release announcement mail

2022-04-10 Thread Daniel Shahaf
Mark Phippard wrote on Sun, 10 Apr 2022 16:30 +00:00:
> Looking at past release announcements, they include a section on who
> signed the release that looks like this:
>
>Stefan Sperling [2048R/4F7DBAA99A59B973] with fingerprint:
> 8BC4 DAE0 C5A4 D65F 4044  0107 4F7D BAA9 9A59 B973
>Branko Čibej [4096R/1BCA6586A347943F] with fingerprint:
> BA3C 15B1 337C F0FB 222B  D41A 1BCA 6586 A347 943F
>Johan Corveleyn [4096R/B59CE6D6010C8AAD] with fingerprint:
> 8AA2 C10E EAAD 44F9 6972  7AEA B59C E6D6 010C 8AAD
>
> I am kind of at a loss for how to produce this information. Assuming
> those three used the same keys as in the past, I would need to know
> what this should like for:
>
> me
> Nathan
> Julian
>
> ... and possibly Daniel Sahlberg if he sends in a signature.  Our KEYS
> file only includes the fingerprint.

Our KEYS file includes the actual keys: it can be piped to
«GNUPGHOME=$(mktemp -d) gpg --import» in order to verify signatures made
by those keys.  It's the release announcement that includes just the
fingerprints.

Daniel

> I have tried a few commands that sort of give me this info but the
> output is so different I am not sure if I would reproduce it
> correctly.
>
> Thanks
>
> Mark


Re: svn commit: r1899311 - /subversion/branches/1.14.x/STATUS

2022-03-31 Thread Daniel Shahaf
Daniel Shahaf wrote on Fri, 01 Apr 2022 00:02 +00:00:
> Daniel Sahlberg wrote on Thu, Mar 31, 2022 at 17:16:49 +0200:
>> One thing to note is that merge-approved-backports.py have no interactive
>> features. But I think it you are expected to run the other Python scripts
>> to get the equivalent functionality.
>
> Correct.  There isn't an implementation of the interactive features (F3
> and F4 in the docs) in backport.py.

s/.  There/, although there currently/

Cheers,

Daniel



Re: Impediments to release

2022-03-31 Thread Daniel Shahaf
Nathan Hartman wrote on Thu, Mar 31, 2022 at 14:49:36 -0400:
> In fact, the last couple of days, I have sifted through hundreds and
> hundreds of changes (basically the list of merge-eligible changes from
> trunk -- a LOT of work was committed in the past 14 months!!)

That sounds very much like what we'll need to do anyway when the time
comes to write changelog for 1.15.0-rc1.

Incidentally, that's a task we used to split.  See, e.g.,
https://svn.red-bean.com/repos/manyhands/ and
https://cwiki.apache.org/confluence/display/SVN/Svn18Changes

Cheers,

Daniel

> and
> found at least a dozen or so that I've flagged to look more closely
> into, but I've decided to wait until after the release to do so. I've
> nominated only the really really low risk ones (which have been
> merged).
> 
> Cheers,
> Nathan


Re: svn commit: r1899311 - /subversion/branches/1.14.x/STATUS

2022-03-31 Thread Daniel Shahaf
Daniel Sahlberg wrote on Thu, Mar 31, 2022 at 17:16:49 +0200:
> Den tors 31 mars 2022 kl 16:12 skrev Daniel Sahlberg <
> daniel.l.sahlb...@gmail.com>:
> 
> > Den tors 31 mars 2022 kl 15:45 skrev Stefan Sperling :
> >
> >> On Thu, Mar 31, 2022 at 09:21:58AM -0400, Nathan Hartman wrote:
> >> > On Thu, Mar 31, 2022 at 9:09 AM Nathan Hartman <
> >> hartman.nat...@gmail.com> wrote:
> >> > > My bad. Hopefully r1899430 fixes it.
> >> > >
> >> > > How do I manually run backport.pl?
> >> >
> >> > Let me rephrase that question: How do I manually trigger it so we
> >> > don't have to wait for the cron job?
> >>
> >
> > I'm guessing you figured out how to run the script! Did you use the Perl
> > or the Python variation? I'm curious if the Python script is more powerful
> > and better at handling merge failures.
> >
> > Could you also do it also in the 1.10.x branch?
> >
> 
> I did this myself, using the Python version:
> 
> dsg@daniel-2022:~/svn_1.10.x$
> ../svn_trunk/tools/dist/merge-approved-backports.py
> 

Glad to see this :-)

> (The path setups I have are slightly different from the ones on svn-qavm
> detailed below).
> 
> 
> >
> > For the benefit of the list, this is how it is executed by cron (expecting
> > to have the active branches checked out in ~/src/svn/1.10.x,
> > ~/src/svn/1.14.x etc. and also trunk checked out as ~/src/svn/trunk):
> >
> > for i in ~/src/svn/1.*.x; do cd $i && $SVN up -q --non-interactive &&
> > YES=1 MAY_COMMIT=1 ../trunk/tools/dist/backport.pl; done
> >
> >
> As far as I can see, the backports from the Perl and the Python versions
> look identical.
> 
> Does anyone have strong feeling regarding using one or the other on
> svn-qavm? Does either version have an advantage with regards to running it
> more often?
> 

.py's implementation is cleaner and is in a language more devs speak.
Personally, I have for years considered for .pl deprecated in favour of .py.

> One thing to note is that merge-approved-backports.py have no interactive
> features. But I think it you are expected to run the other Python scripts
> to get the equivalent functionality.

Correct.  There isn't an implementation of the interactive features (F3
and F4 in the docs) in backport.py.

Cheers,

Daniel


A New Feature[:] Film About Subversion

2022-03-31 Thread Daniel Shahaf
[ Follow-ups to dev@ only, please. ]

Hi, everybody!

As y'all may recall, issue #525 concerns implementing working copies
that need not store an unmodified copy ("pristine", formerly "text-base")
of every versioned file:

https://subversion.apache.org/issue/525

Our currently-envisioned design is documented in what will become
1.15.0's release notes:


https://subversion-staging.apache.org/docs/release-notes/1.15#bare-working-copies

Our development notes are on issue #525's feature branch:


https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf/BRANCH-README

One planned change is to add "hydrating" functions to the internal
interlibrary API:


https://github.com/apache/subversion/commit/dbfcd85cd12fe624d2fbb845da24036bb519aa28
(see the changes under subversion/include/private/)

We are also contemplating adding an «svn hydrate» command:


https://mail-archives.apache.org/mod_mbox/subversion-dev/202201.mbox/%3C877dapkri1.fsf%40red-bean.com%3E
(for further context, see )

The transition between the non-hydrated state and the hydrated state
will be done via "locks", so we'll be adding another paragraph to this
already-overlong sidebar:


https://svnbook.red-bean.com/nightly/en/svn.advanced.locking.html#svn.advanced.locking.meanings

We expect the ability to hydrate a working copy will be particularly
useful to users of the third-party client submerge(1):

https://manpages.debian.org/unstable/subcommander/submerge.1.en.html

We have prepared a short film explaining our vision.  A preliminary
Internet Movie Database™ entry is here:

imdb://tt0xC27BF2/
(spoiler: gur yvax vf erny, ohg jr nerag nssvyvngrq jvgu vg;
jr whfg pbhyqag erfvfg gur cha)

It's not yet released, of course, but if you're interested in a preview,
let us know; a small number of beta copies are available.

Cheers,

Daniel

P.S.  Sorry about that github URL.  I would have linked to

(aka ), but both of these URLs just
redirect to that github URL.  (With HTTP 301, too… ☹)  Does anyone else
see this?  Perhaps it's just the EU mirror?  svn.a.o resolves to
13.90.137.153 here.


Re: svn commit: r1899311 - /subversion/branches/1.14.x/STATUS

2022-03-31 Thread Daniel Shahaf
Daniel Sahlberg wrote on Thu, Mar 31, 2022 at 16:51:33 +0200:
> Den tors 31 mars 2022 kl 16:29 skrev Nathan Hartman <
> hartman.nat...@gmail.com>:
> 
> > On Thu, Mar 31, 2022 at 10:12 AM Daniel Sahlberg
> >  wrote:
> > >
> > > Den tors 31 mars 2022 kl 15:45 skrev Stefan Sperling :
> > >>
> > >> On Thu, Mar 31, 2022 at 09:21:58AM -0400, Nathan Hartman wrote:
> > >> > On Thu, Mar 31, 2022 at 9:09 AM Nathan Hartman <
> > hartman.nat...@gmail.com> wrote:
> > >> > > My bad. Hopefully r1899430 fixes it.
> > >> > >
> > >> > > How do I manually run backport.pl?
> > >> >
> > >> > Let me rephrase that question: How do I manually trigger it so we
> > >> > don't have to wait for the cron job?
> > >
> > >
> > > I'm guessing you figured out how to run the script! Did you use the Perl
> > or the Python variation? I'm curious if the Python script is more powerful
> > and better at handling merge failures.
> >
> > I just went ahead and ran it on my workstation, basically by checking
> > out the branch, changing to its root directory, and running YES=1
> > MAY_COMMIT=1 ../svn-trunk/tools/dist/backport.pl. Which means I ran
> > the Perl version. :-)
> >
> > It would be cool if there were a way to manually trigger svn-role do
> > its thing, but it's not super important.
> >
> 
> Committing a "1" to STATUS.RUNNOW which is checked by a cronjob running
> every minute? But then the problem below is still there.
> 

Suppose Infra gave PMC members a way to commit with svn:author=svn-role,
would that solve the problem?

> >> Related question: Why don't we run the cron job more frequently? :)

FWIW, when backport.pl was written, I scheduled it to 04Z because that
was a time when most developers were inactive, so it was unlikely to
cause conflicts for people by merging a bunch of stuff halfway through
their workday.

That doesn't answer whether we should change the setting; just
explaining where it originates.

> > > I've also asked this question to myself. There was a discussion
> > elsewhere (in the thread regarding migration of svn-qavm (in private@),
> > quoted from memory) where someone mentioned a rare race condition with
> > backport.pl if there was simultaneous commits to STATUS from someone
> > else. This had happened some time in 2013 (?) when two cronjobs was
> > executed simultaneous (on svn-qavm1 and svn-qavm2, during another
> > migration) and they stepped on the toes of each other causing conflicts.
> > There was a comment "this could happen also because of a manual commit to
> > STATUS but we don't have so many commits at 04:00Z so the risk is quite
> > low". I can't judge the risk of this.
> >
> > Hmm, I was going to suggest to run it more frequently around release
> > time but the above sounds like a good reason not to!! Until it gets
> > sorted out...
> 
> 
> We should probably think about sorting this out. I'll dig out the original
> thread and make a summary here, but not until after the relase.

If you mean
,
it was fixed in backport.py at the time.

Cheers,

Daniel


Re: What to do about PGP KEYS for release?

2022-03-31 Thread Daniel Shahaf
Mark Phippard wrote on Wed, Mar 30, 2022 at 08:01:32 -0400:
> I am still a little unsure what to do about the KEYS file when we
> produce this release.
> 
> Our release.py script no longer works for whatever it used to do and
> throws an error. I do not know if more errors will happen when I get
> to the steps of publishing the release to /dist later on.
> 
> I just updated my Apache LDAP with the fingerprint of my new key I
> will use to sign the release.
> 
> Is there anything here we can leverage?
> 
> https://people.apache.org/keys/
> 

I did post just a few days ago
()
a piece of code that assembles a KEYS-like file from this URL…

Cheers,

Daniel

> Maybe someone can manually generate a KEYS file and send it to me to
> include in the release? I imagine if I drop it in the folder with the
> tarballs after I produce them the process will think that it is the
> one it expects to have.
> 
> Mark


Re: svn commit: r1899373 - /subversion/branches/1.14.x/STATUS

2022-03-31 Thread Daniel Shahaf
Daniel Sahlberg wrote on Wed, Mar 30, 2022 at 07:45:01 +0200:
> So.. backports failed today as well. After some digging I realised
> backport.pl didn't pick up the branch in this nomination due to a
> whitespace issue in STATUS. I removed one space character on each line and
> the backport worked.
> 
> However one backport remains in STATUS:
> 
> svnsvn@svn-qavm1:~/src/svn/1.14.x$ for i in ~/src/svn/1.*.x; do cd $i &&
> $SVN up -q --non-interactive && YES=1 MAY_COMMIT=1 ../trunk/tools/dist/
> backport.pl; done
> Warning summary
> ===
> 
> 1.14.x-r1881534-no-crlf (Fix issue #4864 "build/ac-macros/macosx.m4:
> workaround AC_RUN_IFELSE"): Revisions 'r1881534' nominated but not included
> in branch
> svnsvn@svn-qavm1:~/src/svn/1.14.x$
> 
> Can someone check it? I'm ENOTIME to dig into it.
> 

The merge of r1881534 to its backport branch did not add that revision
to svn:mergeinfo.  Which is to say, the error message is correct.

(I already pointed this out in
,
though admittedly not too visibly.)

Cheers,

Daniel

> Kind regards,
> Daniel
> 
> 
> Den ons 30 mars 2022 kl 07:41 skrev :
> 
> > Author: dsahlberg
> > Date: Wed Mar 30 05:41:19 2022
> > New Revision: 1899373
> >
> > URL: http://svn.apache.org/viewvc?rev=1899373=rev
> > Log:
> > * STATUS: Adjust whitespace to see if it resolves backport issues
> >
> > Modified:
> > subversion/branches/1.14.x/STATUS
> >
> > Modified: subversion/branches/1.14.x/STATUS
> > URL:
> > http://svn.apache.org/viewvc/subversion/branches/1.14.x/STATUS?rev=1899373=1899372=1899373=diff
> >
> > ==
> > --- subversion/branches/1.14.x/STATUS (original)
> > +++ subversion/branches/1.14.x/STATUS Wed Mar 30 05:41:19 2022
> > @@ -46,13 +46,13 @@ Approved changes:
> >  =
> >
> >   * r1899227
> > -Don't show unreadable copyfrom paths in 'svn log -v'
> > -Justification:
> > -  Makes 'svn log -v' consistent with spec.
> > -Branch:
> > -  1.14.x-r1899227
> > -Votes:
> > -  +1: hartmannathan, dsahlberg, rhuijben
> > +   Don't show unreadable copyfrom paths in 'svn log -v'
> > +   Justification:
> > + Makes 'svn log -v' consistent with spec.
> > +   Branch:
> > + 1.14.x-r1899227
> > +   Votes:
> > + +1: hartmannathan, dsahlberg, rhuijben
> >
> >   * r1898633
> > Fix sporadic testCrash_RequestChannel_nativeRead_AfterException failure
> >
> >
> >


Re: svn commit: r1899275 - /subversion/site/publish/.message-ids.tsv

2022-03-31 Thread Daniel Shahaf
Daniel Sahlberg wrote on Mon, Mar 28, 2022 at 09:19:19 +0200:
> Anyone got an idea why the URL list was sorted in a different way on the
> new svn-qavm? Not that it is a big difference, but I don't like loose ends.
> 

Probably the locale:

[[[
% <1 LC_ALL=C sort 
http://svn.haxx.se/dev/archive-2010-08/0362.shtml
https://svn.haxx.se/users/archive-2012-09/0236.shtml
https://svn.haxx.se/users/archive-2019-04/0041.shtml
https://svn.haxx.se/users/archive-2020-04/0040.shtml
% <1 LC_ALL=en_US.UTF-8 sort 
https://svn.haxx.se/users/archive-2012-09/0236.shtml
https://svn.haxx.se/users/archive-2019-04/0041.shtml
https://svn.haxx.se/users/archive-2020-04/0040.shtml
http://svn.haxx.se/dev/archive-2010-08/0362.shtml
% 
]]]

Cheers,

Daniel


> In either case, I updated the source to use https on the offending link to
> (r1899280) so we should see another update tomorrow.
> 
> /Daniel
> 
> Den mån 28 mars 2022 kl 06:00 skrev :
> 
> > Author: svn-role
> > Date: Mon Mar 28 04:00:20 2022
> > New Revision: 1899275
> >
> > URL: http://svn.apache.org/viewvc?rev=1899275=rev
> > Log:
> > * publish/.message-ids.tsv: Automatically regenerated.
> >
> > Modified:
> > subversion/site/publish/.message-ids.tsv
> >
> > Modified: subversion/site/publish/.message-ids.tsv
> > URL:
> > http://svn.apache.org/viewvc/subversion/site/publish/.message-ids.tsv?rev=1899275=1899274=1899275=diff
> >
> > ==
> > --- subversion/site/publish/.message-ids.tsv (original)
> > +++ subversion/site/publish/.message-ids.tsv Mon Mar 28 04:00:20 2022
> > @@ -1,5 +1,6 @@
> >  # Message-ids of archived emails that are referenced by a svn.haxx.se
> > URL.
> > -# Generated by tools/haxx-url-to-message-id.sh on 2021-07-04
> > +# Generated by tools/haxx-url-to-message-id.sh on 2022-03-28
> > +http://svn.haxx.se/dev/archive-2010-08/0362.shtml
> > 4c65756c.8070...@collab.net
> >  https://svn.haxx.se/dev/archive-2003-01/1125.shtml
> > 20030116213052.314004c1.tt...@idsoftware.com
> >  https://svn.haxx.se/dev/archive-2003-02/0068.shtml
> > 87wuki4fpy@codematters.co.uk
> >  https://svn.haxx.se/dev/archive-2003-10/0136.shtml
> > 200310031235.h93czgiv064...@bigtex.jrv.org
> > @@ -55,4 +56,3 @@ https://svn.haxx.se/users/archive-2012-0
> >  https://svn.haxx.se/users/archive-2012-09/0236.shtml
> > 20120921085850.gg24...@ted.stsp.name
> >  https://svn.haxx.se/users/archive-2019-04/0041.shtml
> > 9739e241-f88c-8a79-11f5-783a7f119...@neuf.fr
> >  https://svn.haxx.se/users/archive-2020-04/0040.shtml
> > 20200422065424.gl81...@ted.stsp.name
> > -http://svn.haxx.se/dev/archive-2010-08/0362.shtml
> > 4c65756c.8070...@collab.net
> >
> >
> >


Re: svn commit: r1899247 - /subversion/branches/1.13.x/STATUS

2022-03-31 Thread Daniel Shahaf
Daniel Sahlberg wrote on Mon, Mar 28, 2022 at 10:01:26 +0200:
> Den mån 28 mars 2022 kl 02:33 skrev Daniel Shahaf :
> > Daniel Sahlberg wrote on Sun, 27 Mar 2022 23:30 +00:00:
> > > I also made some additional changes in roadmap.html (r1899268). This could
> > > probably be discussed a bit more but the "out of date" and eted
> > > sections doesn't look nice.
> >
> > All I have to say is that it'd be nice to keep somewhere either
> > a working example of those CSS classes or documentation of them… even if
> > it's just "Refer to roadmap.html@r1899267" :-)
> >
> 
> #todo is used in several other places. I don't know if we have specific CSS
> for .
> 
> This was a little bit of a flame bait to see if anyone else has an opinion
> on the eted table.

I was referring to .divider, .task-level-1, and .in-progress, which were
used to create a table with headers within the table.  They are still
defined in our CSS, but now there's no example of their use.  Sorry for
the unclarity.

Cheers,

Daniel


Re: svn commit: r1899247 - /subversion/branches/1.13.x/STATUS

2022-03-27 Thread Daniel Shahaf
Daniel Sahlberg wrote on Sun, 27 Mar 2022 23:30 +00:00:
> Den sön 27 mars 2022 kl 19:37 skrev Daniel Shahaf :
>
>> Thanks for these two changes!  The information is duplicated in
>> <https://subversion.apache.org/docs/release-notes/#supported-versions>;
>> anyone wants to fix that, too?  (Also, 1.14.x is absent from there.)
>>
>
> Done (r1899267). This should be quite straight forward and could be merged
> right away. Thanks for pointing out!
>

Thanks for fixing it.

I wonder whether we should add "1.15.x | Not yet supported" to that table.

> I also made some additional changes in roadmap.html (r1899268). This could
> probably be discussed a bit more but the "out of date" and eted
> sections doesn't look nice.

All I have to say is that it'd be nice to keep somewhere either
a working example of those CSS classes or documentation of them… even if
it's just "Refer to roadmap.html@r1899267" :-)

Cheers,

Daniel


Re: Backports "bot" not running?

2022-03-27 Thread Daniel Shahaf
Mark Phippard wrote on Sun, Mar 27, 2022 at 13:56:51 -0400:
> On Sun, Mar 27, 2022 at 1:45 PM Daniel Sahlberg
>  wrote:
> >
> > Den sön 27 mars 2022 kl 19:33 skrev Daniel Shahaf :
> >>
> >> It's not mutually exclusive; someone can run the script locally.  I'd 
> >> recommend
> >> to run merge-approved-backport.py without arguments.
> >
> >
> > Fails for me:
> > [[[
> > $ ./tools/dist/merge-approved-backports.py
> > Failed to parse entry ' * r1881534 (without CRLF problem)\n
> > [...]
> > ]]]
> >
> > I'm not sure if this is also causing problems for backports.pl.

Apparently not:

> % ../trunk/tools/dist/backport.pl CRLF
> 
> 
> === Candidate changes:
> 
> 
> Skipping r1877310 (doesn't match pattern):
> Add a test for issue #4711 "invalid xml file produced by svn log --xml[...]
> 
> 
> Skipping r1883355 (doesn't match pattern):
> Use the APR-1.4+ API for flushing file contents to disk.
> 
> 
> Skipping the r1878379 group (doesn't match pattern):
> Distinguish configure scripts on release mode and non release mode.
> 
> 
> >>> The 1.14.x-r1881534-no-crlf branch:
> ^/subversion/branches/1.14.x-r1881534-no-crlf
> 
> r1881534 (without CRLF problem)
> Fix issue #4864 "build/ac-macros/macosx.m4: workaround AC_RUN_IFELSE"
> 
>   +1: hartmannathan, stsp
> 
> Run a merge? [y,l,v,±1,±0,q,e,a, ,N,?] y
> Would have committed:
> [[[
>  M  .
> M   build/ac-macros/macosx.m4
> M   STATUS (not shown in the diff)
> Merge the 1.14.x-r1881534-no-crlf branch:
> 
>  * r1881534 (without CRLF problem)
>Fix issue #4864 "build/ac-macros/macosx.m4: workaround AC_RUN_IFELSE"
>Justification:
>  Unblocks cross-compiling SVN.
>Notes:
>  Replacement for veto-blocked r1881534 group (see below) without the
>  inconsistent line endings that instigated said veto blockage.
>Branch:
>  1.14.x-r1881534-no-crlf
>Votes:
>  +1: hartmannathan, stsp
> ]]]
> Would remove merged '1.14.x-r1881534-no-crlf' branch
> Shall I open a subshell? [ydN?] y
> 
> % svn diff --depth=empty
> Index: .
> ===
> --- .   (revision 1899259)
> +++ .   (working copy)
> 
> Property changes on: .
> ___
> Modified: svn:mergeinfo
> ## -0,0 +0,1 ##
>Merged /subversion/branches/1.14.x-r1881534-no-crlf:r1885959-1899259

So, yeah, could do the merges with backport.pl instead.  That'd be
«YES=1 MAY_COMMIT=1 …/backport.pl».  (That's all documented in
README.backport and in the script's --help.)

Note there is no "Merged /subversion/trunk:…" line in the diff.  That's
because the merge of r1881534 to its backport branch (r1885960) didn't
include a mergeinfo change to the branch root.

> Guess we should just remove " (without CRLF problem)" from that line
> to fix the problem.

Or this, as I see Nathan has done.

I guess backport.pl shouldn't complain about parentheticals in the
bullet line when there's a "Branch:" header.

Cheers,

Daniel


Re: svn commit: r1899247 - /subversion/branches/1.13.x/STATUS

2022-03-27 Thread Daniel Shahaf
Thanks for these two changes!  The information is duplicated in
;
anyone wants to fix that, too?  (Also, 1.14.x is absent from there.)

dsahlb...@apache.org wrote on Sun, 27 Mar 2022 16:14 +00:00:
> Author: dsahlberg
> Date: Sun Mar 27 16:14:27 2022
> New Revision: 1899247
>
> URL: http://svn.apache.org/viewvc?rev=1899247=rev
> Log:
> * STATUS: 1.13.x is end of life (2019-10-30 + 6 months < $TODAY)
>
> Modified:
> subversion/branches/1.13.x/STATUS
>
> Modified: subversion/branches/1.13.x/STATUS
> URL: 
> http://svn.apache.org/viewvc/subversion/branches/1.13.x/STATUS?rev=1899247=1899246=1899247=diff
> ==
> --- subversion/branches/1.13.x/STATUS (original)
> +++ subversion/branches/1.13.x/STATUS Sun Mar 27 16:14:27 2022
> @@ -1,6 +1,6 @@
>* * * * * * * * * * * * * * * * * * * * * * * * * * * *
>* *
> -  *  THIS RELEASE STREAM IS OPEN FOR STABILIZATION. *
> +  *  THIS RELEASE STREAM IS CLOSED: SUPERSEDED BY 1.14. *
>* *
>* * * * * * * * * * * * * * * * * * * * * * * * * * * *


Re: Backports "bot" not running?

2022-03-27 Thread Daniel Shahaf
Mark Phippard wrote on Sun, 27 Mar 2022 14:06 +00:00:
> On Sun, Mar 27, 2022 at 9:53 AM Nathan Hartman  
> wrote:
>>
>> On Sun, Mar 27, 2022 at 9:46 AM Stefan Sperling  wrote:
>>>
>>> On Sun, Mar 27, 2022 at 09:35:51AM -0400, Nathan Hartman wrote:
>>> > On Sun, Mar 27, 2022 at 9:05 AM Mark Phippard  wrote:
>>> > >
>>> > > On Sun, Mar 27, 2022 at 8:50 AM Nathan Hartman 
>>> > >  wrote:
>>> > > >
>>> > > > On Sun, Mar 27, 2022 at 7:05 AM Mark Phippard  
>>> > > > wrote:
>>> > > >>
>>> > > >> On Sun, Mar 27, 2022 at 7:00 AM Daniel Sahlberg
>>> > > >>  wrote:
>>> > > >> >
>>> > > >> > Hi,
>>> > > >> > It is due to the migration of svn-qavm to the new host as 
>>> > > >> > requested by ASF Infra. I'll look into it right away, it has been 
>>> > > >> > on my todo list since last week, sorry!
>>> > > >> > /Daniel
>>> > > >>
>>> > > >> Thanks. We will want it to get this release process completed but in
>>> > > >> the near term once this first batch of backports are merged I am
>>> > > >> hoping that will make all of the tests on the branch run successfully
>>> > > >> again.
>>> > > >
>>> > > >
>>> > > > Not sure if you're referring to the buildbots here but that's the 
>>> > > > other broken important thing. I'll try to look into it soon. I think 
>>> > > > I can get most of the buildbots running by commenting a few lines and 
>>> > > > renaming a file.
>>> > >
>>> > > I am sure it would be nice to have the buildbots running if they are
>>> > > not, but I will run the tests locally before posting any tarballs so I
>>> > > do not feel like I need these to do the release.
>>> > >
>>> > > I was only referencing our automated backport script which merges
>>> > > approved changes. That is not happening at the moment so the branches
>>> > > are not being updated with approved changes.
>>> > >
>>> > > Since the 1.14.x branch currently has test failures when I run locally
>>> > > I was just hoping to see a clean run happen so I know we are in better
>>> > > shape to start progressing towards a release.
>>> >
>>> > Last night I ran the tests on 1.14.x with the following merged and all
>>> > tests passed for me:
>>> >
>>> > r1877310 r1883355 r1878379 r1883719 r1883722 r1884610 r1881534
>>> > r1883838 r1883989 r1886460 r1886582 r1887641 r1890013 r1889629
>>> > r1892470 r1892471 r1892541 r1894734 r1897449 r1898633 r1899227
>>> >
>>> > So hopefully all those (or the subset that fixes the broken tests)
>>> > will be approved and merged soon...
>>>
>>> Anyone should feel free to merge+commit approved changes.
>>> I have often bypassed the backport merge bot while doing RM work.
>>> This bot is just a nice-to-have convenience and its absence should not
>>> prevent us from making progress.
>>>
>>> Cheers,
>>> Stefan
>>
>> Thanks for mentioning that. I secretly planned to do exactly that if we 
>> didn't get svn-role working in time. Now if it comes to that, I won't have 
>> to feel bad about it. :-)
>
> I would like to give at least this week for people to cast votes so we
> can get as much into the releases as possible. So I am fine with
> giving Daniel or anyone else a little time to get the script running
> again, but yeah if we get closer to wanting to roll a release we can
> have someone just start doing the backports.
>
> I do like the consistency the script provides as it makes it a lot
> easier to create the CHANGES file and just examine the branch history.

It's not mutually exclusive; someone can run the script locally.  I'd recommend
to run merge-approved-backport.py without arguments.


Re: Questions on Release Management Process

2022-03-24 Thread Daniel Shahaf
Mark Phippard wrote on Wed, 23 Mar 2022 11:36 +00:00:
> On Tue, Mar 22, 2022 at 11:51 PM Daniel Shahaf  
> wrote:
>>
>> Mark Phippard wrote on Mon, Mar 21, 2022 at 16:46:55 -0400:
>> > On Mon, Mar 21, 2022 at 4:31 PM Stefan Sperling  wrote:
>> > > On Mon, Mar 21, 2022 at 12:44:44PM -0400, Mark Phippard wrote:
>> > > > Problem 1: Rolling the tarballs
>> > > >
>> > > > The process creates the tarballs but fails near the end. It looks GPG 
>> > > > related?
>> > >
>> > > >INFO:root:Building Unix tarballs
>> > > >INFO:root:Moving artifacts and calculating checksums
>> > > >Traceback (most recent call last):
>> > > >  File "trunk/tools/dist/release.py", line 1916, in 
>> > > >main()
>> > > >  File "trunk/tools/dist/release.py", line 1912, in main
>> > > >args.func(args)
>> > > >  File "trunk/tools/dist/release.py", line 983, in roll_tarballs
>> > > >download_file(KEYS, filepath, None)
>> > > >  File "trunk/tools/dist/release.py", line 289, in download_file
>> > > >response = urlopen(url)
>> > > >  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
>> > > >return opener.open(url, data, timeout)
>> > > >  File "/usr/lib/python2.7/urllib2.py", line 435, in open
>> > > >response = meth(req, response)
>> > > >  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
>> > > >'http', request, response, code, msg, hdrs)
>> > > >  File "/usr/lib/python2.7/urllib2.py", line 473, in error
>> > > >return self._call_chain(*args)
>> > > >  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
>> > > >result = func(*args)
>> > > >  File "/usr/lib/python2.7/urllib2.py", line 556, in 
>> > > > http_error_default
>> > > >raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
>> > > >urllib2.HTTPError: HTTP Error 404: Not Found
>> > >
>> > > It seems ASF have removed the KEYS file our script is trying to fetch.
>> > > See http://people.apache.org/keys/ where it says "Project group files are
>> > > no longer created."
>> > >
>> > > It looks like what the script wants to do here is obtain a copy of
>> > > the Subversion project's KEYS file and store it along with release
>> > > artifacts. If we want to keep doing this we will have to maintain
>> > > our own KEYS file on the website, I suppose. Otherwise, we could
>> > > decide to no longer provide such a file and remove relevant code
>> > > from the script. Not sure what is better.
>> >
>> > Since I do not know what this was used for, maybe someone can help get
>> > us to a decision and update the script? Daniel Shahaf, if you see
>> > this, I think of you as the resident expert on this stuff. Any thoughts?
>>
>> The design goals here are two:
>>
>> 1. When someone with commit access to N ASF projects updates their PGP
>> key, they shouldn't have to do O(N) work to update N KEYS files.  They
>> should have to do either O(1) work or, ideally, nothing at all.
>>
>> 2. Releases should snapshot the keys that are current at the time they
>> are generated, in order to remain verifiable in archive.a.o even if the
>> keys in question are later removed from LDAP (by root@ as part of
>> a manual password reset, or by the committer via id.a.o).
>>
>> The ASF-wide "generate group keys" scripts addressed #1.  They used to
>> generate two files, one with only the full committers' keys and one with
>> both full and partial committers' keys.  We used the former (to reduce
>> attack surface) until it stopped getting generated.
>>
>> Copying the file to subversion-.KEYS addressed #2.
>
> Thanks for responding and please forgive some basic questions. My
> "knowledge" of this begins and ends with running gpg -ba to sign
> releases and copying and pasting the output to an email.
>
> It sounds like the KEYS file is a list of all of the GPG Keys for ASF
> committers?
>

Yes.  It's basically the output of «gpg --armor --export $args» where $args is
developers' keys, as recently updated from public keyservers, with cosmetic
text added before 

Re: multi-wc-format: upgrading externals

2022-03-22 Thread Daniel Shahaf
Daniel Shahaf wrote on Fri, Mar 18, 2022 at 01:16:48 +:
> Daniel Shahaf wrote on Thu, 17 Mar 2022 23:02 +00:00:
> > Daniel Shahaf wrote on Tue, Mar 08, 2022 at 10:57:17 +:
> >> Julian Foad wrote on Wed, Mar 02, 2022 at 13:04:51 +0000:
> >> > Daniel Shahaf wrote:
> >> > > multi-wc-format/BRANCH-README mentioned this:
> >> > >  
> >> > >> [*] New externals working copies must inherit the format from their
> >> > >>parent working copy, because [...]
> >> > >  
> >> > > Upgrading a parent working copy upgrades external wc's too.  However,
> >> > > upgrading an external succeeds.  Judging by the quoted remark, should
> >> > > «svn upgrade --compatible-version=$N /path/to/external» error out 
> >> > > unless
> >> > > the external's parent working copy is already at version $V?
> >> > 
> >> > It isn't clear to me whether allowing it or disallowing it is more 
> >> > "right".
> >> > 
> >> > Can anyone else chime in?
> >> > 
> >> 
> >> Hmm.  Considering that «svn update» recurses into externals by default,
> >> but that nothing recurses upwards into parent wc's by default, perhaps
> >> we should design things around making sure these two cases continue to
> >> work?  I.e., disallow selective upgrades that might make another
> >> client's «svn update» of the outer wc fail because the outer wc and the
> >> external wc are different formats?
> >> 
> >> Following this train of thought, we'll forbid upgrading an external
> >> without also upgrading a parent wc, but will entertain patches to make
> >> «svn upgrade» _not_ descend into external wc's by default, should anyone
> >> submit such.  (I don't propose we add this ourselves for the MVP.)
> >> 
> >
> > Another perspective: If we aren't sure, we should make upgrading an
> > external an error i n1.15, because that leaves users a workaround
> > (upgrade the parent wc) and we can make it a non-error in the future,
> > whereas if 1.15 allows upgrading only the external wc, backwards
> > compatibility with that would be expected.
> >
> > If anyone thinks «svn upgrade /path/to/wc/path/to/external» should be
> > allowed, do speak up.

In second (or third) thought: Isn't this orthogonal to the
multi-wc-format work?  To date it has always been possible to upgrade an
external without upgrading its parent [1], and thereby make the parent
not «svn update»able by older clients (cf. points (a) and (b) from
multi-wc-format's BRANCH-README as quoted in SVN-4890's OP [2]).  The
fact that the client doing the upgrade has multi-wc-format support
doesn't affect this logic.

This would argue towards leaving SVN-4890 open, but making it not block
SVN-4883.

Cheers,

Daniel

[1] The test posted to SVN-4890 raises SVNExpectedStderr if run in trunk
against 1.14.1, implying that upgrade succeeds.

[2]
[*] New externals working copies must inherit the format from their
parent working copy, because mixed-format working copies are a) a
Bad Thing, and b) defeat the purpose of this feature, which is
support for multiple versions of the client in the same working
copy.

> How would one make «svn upgrade foo/bar» a failure if foo/bar is an
> external within something?  I guess by calling
> svn_dirent_basename("foo/bar") and then running some svn_wc_* API on
> that, but which?  The same one that «svn status --depth=empty» uses, or
> is there a better one?
> 
> Thanks,
> 
> Daniel


Re: Questions on Release Management Process

2022-03-22 Thread Daniel Shahaf
Mark Phippard wrote on Mon, Mar 21, 2022 at 16:46:55 -0400:
> On Mon, Mar 21, 2022 at 4:31 PM Stefan Sperling  wrote:
> > On Mon, Mar 21, 2022 at 12:44:44PM -0400, Mark Phippard wrote:
> > > Problem 1: Rolling the tarballs
> > >
> > > The process creates the tarballs but fails near the end. It looks GPG 
> > > related?
> >
> > >INFO:root:Building Unix tarballs
> > >INFO:root:Moving artifacts and calculating checksums
> > >Traceback (most recent call last):
> > >  File "trunk/tools/dist/release.py", line 1916, in 
> > >main()
> > >  File "trunk/tools/dist/release.py", line 1912, in main
> > >args.func(args)
> > >  File "trunk/tools/dist/release.py", line 983, in roll_tarballs
> > >download_file(KEYS, filepath, None)
> > >  File "trunk/tools/dist/release.py", line 289, in download_file
> > >response = urlopen(url)
> > >  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
> > >return opener.open(url, data, timeout)
> > >  File "/usr/lib/python2.7/urllib2.py", line 435, in open
> > >response = meth(req, response)
> > >  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
> > >'http', request, response, code, msg, hdrs)
> > >  File "/usr/lib/python2.7/urllib2.py", line 473, in error
> > >return self._call_chain(*args)
> > >  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
> > >result = func(*args)
> > >  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
> > >raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
> > >urllib2.HTTPError: HTTP Error 404: Not Found
> >
> > It seems ASF have removed the KEYS file our script is trying to fetch.
> > See http://people.apache.org/keys/ where it says "Project group files are
> > no longer created."
> >
> > It looks like what the script wants to do here is obtain a copy of
> > the Subversion project's KEYS file and store it along with release
> > artifacts. If we want to keep doing this we will have to maintain
> > our own KEYS file on the website, I suppose. Otherwise, we could
> > decide to no longer provide such a file and remove relevant code
> > from the script. Not sure what is better.
> 
> Since I do not know what this was used for, maybe someone can help get
> us to a decision and update the script? Daniel Shahaf, if you see
> this, I think of you as the resident expert on this stuff. Any thoughts?

The design goals here are two:

1. When someone with commit access to N ASF projects updates their PGP
key, they shouldn't have to do O(N) work to update N KEYS files.  They
should have to do either O(1) work or, ideally, nothing at all.

2. Releases should snapshot the keys that are current at the time they
are generated, in order to remain verifiable in archive.a.o even if the
keys in question are later removed from LDAP (by root@ as part of
a manual password reset, or by the committer via id.a.o).

The ASF-wide "generate group keys" scripts addressed #1.  They used to
generate two files, one with only the full committers' keys and one with
both full and partial committers' keys.  We used the former (to reduce
attack surface) until it stopped getting generated.

Copying the file to subversion-.KEYS addressed #2.

So, what to do?

- We could talk to some ASF-wide list (comdev, infra, site-dev@) about
  resuming generation of group keys files.
  
  (If you do this, someone will probably ask you to comply with
  a markdown document that calls itself policy.  I don't believe that
  document is Foundation policy, and even if it is, our practice has
  been +1ed by an Officer of the Foundation.)

- We could roll our own automation, relying on the key fingerprints in
  id.a.o (LDAP) via <https://people.apache.org/keys/committer/>:

for availid in $(
perl -anE 'say $F[0] if (/^Blanket/../END ACTIVE FULL.*SCRIPTS 
LOOK FOR IT/ and /@/)' \
< /path/to/COMMITTERS
) ; do
echo curl -sSfO 
https://people.apache.org/keys/committer/${availid}.asc || test $? -eq 22 || 
return 1;
done
cat *.asc > subversion-….keys

- We could roll our own solution without relying even on
  <https://people.apache.org/keys/committer/>, since its API has already
  been broken twice.



As to the wider problem here: I've RM'd once and could probably produce
a reference transcript of the actual rolling bit (the part that
culminates in having ../subversion-1.14.2.tar.gz ready to upload).
However, that's not at the top of my todo list.


Re: svn commit: r1899014 - /subversion/trunk/subversion/tests/cmdline/upgrade_tests.py

2022-03-17 Thread Daniel Shahaf
Jun Omae wrote on Fri, Mar 18, 2022 at 11:27:16 +0900:
> Hi,
> 
> On Fri, Mar 18, 2022 at 9:44 AM Daniel Shahaf  wrote:
> >
> > Could someone test this on Windows, please?  I suspect read_wc_formats()
> > (added in r1899012) returns paths with os.sep, but the expectations
> > added in this commit use '/', so something will need to convert.
> >
> > Thanks,
> >
> > Daniel
> 
> I got 3 failures from testing r1899017 on Windows. dav-tests.log is attached.
> See also 
> https://github.com/jun66j5/subversion/runs/5594602036?check_suite_focus=true#step:7:2816
> 

Thank you!

Does r1899019 fix this?




Re: multi-wc-format: upgrading externals

2022-03-17 Thread Daniel Shahaf
Daniel Shahaf wrote on Thu, 17 Mar 2022 23:02 +00:00:
> Daniel Shahaf wrote on Tue, Mar 08, 2022 at 10:57:17 +:
>> Julian Foad wrote on Wed, Mar 02, 2022 at 13:04:51 +0000:
>> > Daniel Shahaf wrote:
>> > > multi-wc-format/BRANCH-README mentioned this:
>> > >  
>> > >> [*] New externals working copies must inherit the format from their
>> > >>parent working copy, because [...]
>> > >  
>> > > Upgrading a parent working copy upgrades external wc's too.  However,
>> > > upgrading an external succeeds.  Judging by the quoted remark, should
>> > > «svn upgrade --compatible-version=$N /path/to/external» error out unless
>> > > the external's parent working copy is already at version $V?
>> > 
>> > It isn't clear to me whether allowing it or disallowing it is more "right".
>> > 
>> > Can anyone else chime in?
>> > 
>> 
>> Hmm.  Considering that «svn update» recurses into externals by default,
>> but that nothing recurses upwards into parent wc's by default, perhaps
>> we should design things around making sure these two cases continue to
>> work?  I.e., disallow selective upgrades that might make another
>> client's «svn update» of the outer wc fail because the outer wc and the
>> external wc are different formats?
>> 
>> Following this train of thought, we'll forbid upgrading an external
>> without also upgrading a parent wc, but will entertain patches to make
>> «svn upgrade» _not_ descend into external wc's by default, should anyone
>> submit such.  (I don't propose we add this ourselves for the MVP.)
>> 
>
> Another perspective: If we aren't sure, we should make upgrading an
> external an error i n1.15, because that leaves users a workaround
> (upgrade the parent wc) and we can make it a non-error in the future,
> whereas if 1.15 allows upgrading only the external wc, backwards
> compatibility with that would be expected.
>
> If anyone thinks «svn upgrade /path/to/wc/path/to/external» should be
> allowed, do speak up.

How would one make «svn upgrade foo/bar» a failure if foo/bar is an
external within something?  I guess by calling
svn_dirent_basename("foo/bar") and then running some svn_wc_* API on
that, but which?  The same one that «svn status --depth=empty» uses, or
is there a better one?

Thanks,

Daniel


Re: svn commit: r1899014 - /subversion/trunk/subversion/tests/cmdline/upgrade_tests.py

2022-03-17 Thread Daniel Shahaf
Could someone test this on Windows, please?  I suspect read_wc_formats()
(added in r1899012) returns paths with os.sep, but the expectations
added in this commit use '/', so something will need to convert.

Thanks,

Daniel

danie...@apache.org wrote on Fri, 18 Mar 2022 00:40 +00:00:
> Author: danielsh
> Date: Fri Mar 18 00:40:29 2022
> New Revision: 1899014
>
> URL: http://svn.apache.org/viewvc?rev=1899014=rev
> Log:
> * subversion/tests/cmdline/upgrade_tests.py
>   (upgrade_with_externals): Verify format numbers of upgraded externals.
>   (check_formats): New.
>   (check_format): Verify the argument type to guard against typos.
>
> Modified:
> subversion/trunk/subversion/tests/cmdline/upgrade_tests.py
>
> Modified: subversion/trunk/subversion/tests/cmdline/upgrade_tests.py
> URL: 
> http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/upgrade_tests.py?rev=1899014=1899013=1899014=diff
> ==
> --- subversion/trunk/subversion/tests/cmdline/upgrade_tests.py (original)
> +++ subversion/trunk/subversion/tests/cmdline/upgrade_tests.py Fri Mar 18 
> 00:40:29 2022
> @@ -102,11 +102,21 @@ def replace_sbox_repo_with_tarfile(sbox,
>shutil.move(os.path.join(extract_dir, dir), sbox.repo_dir)
> 
>  def check_format(sbox, expected_format):
> +  assert isinstance(expected_format, int)
>formats = sbox.read_wc_formats()
>if formats[''] != expected_format:
>  raise svntest.Failure("found format '%d'; expected '%d'; in wc '%s'" %
>(formats[''], expected_format, sbox.wc_dir))
> 
> +def check_formats(sbox, expected_formats):
> +  assert isinstance(expected_formats, dict)
> +  formats = sbox.read_wc_formats()
> +  ### If we ever need better error messages here, reuse 
> run_and_verify_info().
> +  if formats != expected_formats:
> +raise svntest.Failure("found format '%s'; expected '%s'; in wc '%s'" %
> +  (formats, expected_formats, sbox.wc_dir))
> +
> +
>  def check_pristine(sbox, files):
>for file in files:
>  file_path = sbox.ospath(file)
> @@ -334,7 +344,18 @@ def upgrade_with_externals(sbox):
>   'upgrade', sbox.wc_dir)
> 
># Actually check the format number of the upgraded working copy
> -  check_format(sbox, get_current_format())
> +  check_formats(sbox,
> +  {relpath: get_current_format()
> +   for relpath in (
> + '',
> + 'A/D/exdir_A',
> + 'A/D/exdir_A/G',
> + 'A/D/exdir_A/H',
> + 'A/D/x',
> + 'A/C/exdir_G',
> + 'A/C/exdir_H',
> +   )})
> +
>check_pristine(sbox, ['iota', 'A/mu',
>  'A/D/x/lambda', 'A/D/x/E/alpha'])


Re: multi-wc-format: upgrading externals

2022-03-17 Thread Daniel Shahaf
Daniel Shahaf wrote on Tue, Mar 08, 2022 at 10:57:17 +:
> Julian Foad wrote on Wed, Mar 02, 2022 at 13:04:51 +:
> > Daniel Shahaf wrote:
> > > multi-wc-format/BRANCH-README mentioned this:
> > >  
> > >> [*] New externals working copies must inherit the format from their
> > >>parent working copy, because [...]
> > >  
> > > Upgrading a parent working copy upgrades external wc's too.  However,
> > > upgrading an external succeeds.  Judging by the quoted remark, should
> > > «svn upgrade --compatible-version=$N /path/to/external» error out unless
> > > the external's parent working copy is already at version $V?
> > 
> > It isn't clear to me whether allowing it or disallowing it is more "right".
> > 
> > Can anyone else chime in?
> > 
> 
> Hmm.  Considering that «svn update» recurses into externals by default,
> but that nothing recurses upwards into parent wc's by default, perhaps
> we should design things around making sure these two cases continue to
> work?  I.e., disallow selective upgrades that might make another
> client's «svn update» of the outer wc fail because the outer wc and the
> external wc are different formats?
> 
> Following this train of thought, we'll forbid upgrading an external
> without also upgrading a parent wc, but will entertain patches to make
> «svn upgrade» _not_ descend into external wc's by default, should anyone
> submit such.  (I don't propose we add this ourselves for the MVP.)
> 

Another perspective: If we aren't sure, we should make upgrading an
external an error i n1.15, because that leaves users a workaround
(upgrade the parent wc) and we can make it a non-error in the future,
whereas if 1.15 allows upgrading only the external wc, backwards
compatibility with that would be expected.

If anyone thinks «svn upgrade /path/to/wc/path/to/external» should be
allowed, do speak up.

Cheers,

Daniel


> Cheers,
> 
> Daniel
> 
> > In the meantime, I filed your question as 
> > https://subversion.apache.org/issue/4890
> > 
> > - Julian
> > 


Re: multi-wc-format: svn_wc__format_from_version()

2022-03-17 Thread Daniel Shahaf
Julian Foad wrote on Thu, 17 Mar 2022 20:33 +00:00:
> That's an old function. "Characteristic" previously meant the only 
> format supported by a given client version. We should change the word 
> now. What should the function return now? The newest, I think: its 
> callers are upgrade and checkout; essentially it is used to implement 
> the --bikeshed=1.15 option.

It doesn't seem to be an old function; the docstring says (in a part I snipped) 
it's new in 1.15.  However, I don't think that changes the answer.  Done in 
r1899004.

Thanks, Julian.

Daniel


multi-wc-format: svn_wc__format_from_version()

2022-03-17 Thread Daniel Shahaf
Here's the docstring:

[[[
/**
 * Convert @a version to that version's characteristic working copy
 * format, returned in @a format.
 *
 * A NULL @a version translates to the library's default version.
 ⋮
 */
svn_error_t *
svn_wc__format_from_version(int *format,
const svn_version_t* version,
apr_pool_t *scratch_pool);
]]]

Here's part of the implementation:

[[[
  switch (version->minor)
{
  case 0:  /* Same as 1.3.x. */
  case 1:  /* Same as 1.3.x. */
  case 2:  /* Same as 1.3.x. */
  case 3:  *format = 4; break;
  case 4:  *format = 8; break;
  case 5:  *format = 9; break;
  case 6:  *format = 10; break;
  case 7:  *format = 29; break;
  case 8:  /* Same as 1.14.x. */
  case 9:  /* Same as 1.14.x. */
  case 10: /* Same as 1.14.x. */
  case 11: /* Same as 1.14.x. */
  case 12: /* Same as 1.14.x. */
  case 13: /* Same as 1.14.x. */
  case 14: *format = 31; break;
  case 15: /* Same as the current version. */
  default: *format = SVN_WC__VERSION; break;
}

  return SVN_NO_ERROR;
]]]

What does the term "characteristic" mean?  Is it the oldest, newest, or
default wc format supported by @a version?

The implementation uses the newest, but that may have been an oversight
(cf. r1899000), and it's not clear to me from the callers what they
expect.

Cheers,

Daniel


Re: multi-wc-format review

2022-03-17 Thread Daniel Shahaf
Julian Foad wrote on Thu, Mar 17, 2022 at 11:10:32 +:
> Daniel Shahaf wrote:
> > + The upgraded working copy will be compatible with Subversion 1.8 and
> > + newer (this default may change ...
> 
> Sure, +1, a bit clearer.
> 

Committed.

> Also see Nathan's option-naming proposal at the end of this message.

Ack.  We can still rename the option, but I didn't want to block committing
the usage message patch on that.

> > Format numbers are sequential.  When upgrading from f30 to f40, there's
> > no way to skip f35.  If we wanted that, we'd need some sort of
> > capability-like mechanism.  Is that perhaps what you have in mind?  That
> > a user might want to upgrade to 1.16 but not enable pristines-on-demand?
> > If so, we'll need a way to enable/disable pristines-on-demand that isn't
> > "format >= 32", as discussed previously.
> 
> I think we will need that, yes.

OK.  This is already tracked as SVN-4889 and marked as a blocker of
SVN-525.

> >> My point is, using the running software version as a proxy for a WC
> >> format introduces this ambiguity: [...]
> >> This is why I think we should do at least one of:
> >>  
> >> - require the exact first-introduced version (1.8 or 1.15)
> >  
> > I still don't like the idea of requiring users to figure out somehow
> > they should pass 1.8 when they want compatibility with 1.9.  That's
> > a problem we should solve for them.
> 
> Maybe. I can see it both ways. Maybe best to allow the flexibility in
> input ("=1.9") and make clear in the feedback (both immediate response
> from the command, and "svn info" kind of feedback) that the format
> chosen is compatible with "1.8".

`svn info` already does that:

% svn info
⋮
Working Copy Compatible With Version: 1.8
Working Copy Format: 31

The command's output idea is tracked in SVN-4885.

> [...]
> >> > Why should we move any of that to include/private/? [except]
> >> > SVN_WC__SUPPORTED_VERSION and SVN_WC__VERSION [...]
> >>  
> >> They are all a closely related family. The minimum format numbers for
> >> old (no longer supported) features don't need to be used outside
> >> libsvn_wc upgrade code, indeed. But the minimum format numbers for new
> >> features that are within the range of supported formats DO from now on
> >> need to be known by libsvn_client. A new one of them will be introduced
> >> with format 32:
> >>  
> >>   #define SVN_WC__PRISTINES_ON_DEMAND_VERSION 32
> >>  
> >> We could split up the list... or keep it all together.
> >  
> > If it needs to be known by libsvn_client, then it should be in
> > include/private/… unless there is some reason it should be public?
> 
> Indeed. I think "include/private" is right for now. Clients linking to
> libsvn_client will also need to know something about formats... I'm not
> sure whether to make these WC APIs public right away, so that any client
> developers can get on with using them. If not, if we would be expected
> to add alternative ways to access the info through libsvn_client public
> APIs (that hide WC format number details and expose the info another way
> (equivalent to the --bikeshed=1.15 UI and/or feature flags).

I'm afraid I'm having a hard time following your train of thought here:
I'm not sure which of the svn_wc__*/SVN_WC__* APIs mentioned upthread
you think to make public, and what kind of alternative you mean.

The cmdline client wc format logic doesn't use any private APIs.  The
public libsvn_client API doesn't use format numbers either, other than
in svn_client_get_wc_formats_supported() and 
svn_client_wc_version_from_format().
What questions might a client API consumer want to ask, that can't be
answered in a straightforward manner by the existing public APIs?

In the context of this branch, I guess the questions are "What clients
can read ?" and "What's the minimum format that 
supports?".  The former is answered by svn_client_wc_version_from_format().
The latter can be answered on  by calling
svn_client_get_wc_formats_supported(), but doesn't seem to be
straightforward to answer on newer minor versions.  So, perhaps we should
teach svn_client_get_wc_formats_supported() to take an svn_version_t*
parameter and return the formats supported by that version.  Is this
useful enough to be included in the first release?

> Nathan Hartman wrote:
> > I wonder if user confusion can be mitigated / consistency could be
> > improved by calling it '--min-compatible-client=1.15' rather than
> > '--compatible-version'?
> 
> That sounds good to me. That's both more understandable (to novice
> users) and more explicit

Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 16, 2022 at 21:03:28 +:
> Daniel Shahaf wrote:
> >Also, unrelated: have we verified that all the temporary files we create
> >are created in a crash-safe way?  I.e., that if libsvn_wc is SIGKILL'd
> >partway through hydrating something, the something will be cleaned up by
> >libsvn_wc at some point in the future?
> 
> I haven't reviewed for that. Could you perhaps record it somewhere more 
> find-able?

https://issues.apache.org/jira/browse/SVN-4896

> Not sure if it pertains to this issue thread or the whole i525.

The latter.  Sorry for the misunderstanding.


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 16, 2022 at 20:44:08 +:
> We're free to continue design discussions but I've limited time and
> need to focus. To me it appears we've moved far enough along this path
> of "some of our users want to do X" leading to "let's see how far we
> can implement an alternative" and now "let's consider the user's
> work-around options, and now "but the work-around has these
> consequences; mightn't that be a problem?". It seems to me we now know
> what are the two design directions, the original which is sub-optimal
> but near ready to use, and the alternative, now begun on its own
> "-issue4892" branch.

OK.

> I want to refrain from further speculation about how willing such
> a user would be to use the original design with work-arounds, and
> rather ask them first.

+1 with a caveat: User input shouldn't the only factor in our decision
of whether to choose the workaround design.  It'd be useful information,
and it'd _support_ a particular course of action, but it wouldn't
_imply_ that particular course of action.

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 16, 2022 at 06:52:48 +:
> Daniel Shahaf wrote:
> >Julian Foad wrote:
> >> exploration was enough to show that an initial release based on the
> >> original approach has possibilities of being improved, incrementally, in
> >> that way, as and when resources permit.
> >> 
> >> In other words I am not recommending choosing one approach and
> >> abandoning the other, but starting with one and postponing the other as
> >> possible future improvement work.
> >
> >Sorry, but could you spell out what are the "one approach" and "the
> >other"?  Are you proposing to release the code as it is, fetching in
> >advance, and saying you're confident it can in the future be taught to
> >fetch during the operation, notwithstanding kotkov@'s points about
> >RA-level timeouts?
> Yes; while uncertain how much effort it might require to overcome the 
> concerns such as RA-level timeouts.

Sounds good.

Also, unrelated: have we verified that all the temporary files we create
are created in a crash-safe way?  I.e., that if libsvn_wc is SIGKILL'd
partway through hydrating something, the something will be cleaned up by
libsvn_wc at some point in the future?

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 16, 2022 at 19:49:38 +:
> Daniel Shahaf wrote:
> > [...]I suspect I'm still missing something.
> 
> I suggest you re-read the issue 4892 use case: 
> https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-issue4892/notes/i525/i525-use-case-4892-minimal-update.txt
> 
> The request is to break the original design's invariant for this case.

By only hydrating files that have been updated repository-side.  How
will small, modified files that _haven't_ been remotely modified get
hydrated, then?  The logic is the same for small and large files, IIUC.

Also, why is this specific to «svn update»?  It seems to apply equally
well to «svn diff» without further arguments, since the "large" files
are presumed to be undiffable, but the issue, the notes, and the OP of
this thread all treat «svn update» as a _sui generis_ case.

If the issue does apply not only to 'update' but also to 'diff', that
suggests we should look for a solution that applies to both of them
(e.g., exclude "large" files from being recursed into by default, or
make it so "large" files _never_ get hydrated).

Sorry, I feel like I must be asking questions that must have already
been discussed, but I _have_ read the threads and I still don't know the
answers to these.

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 16, 2022 at 07:03:43 +:
> Daniel Shahaf wrote:
> >This implies the wc won't be uniform revision.  This might break user
> >expectations; might [...], 
> I'm not sure how your clarification helps us progress. The point is:
> 
> It might be *absolutely fine* for the real life users in their real life 
> situations, and that's what we need to find out.

So what are you saying?  That we should stop doing design discussions
and go talk to users?  I agree we should talk to users, but I don't
think we can pass the design buck to them and go "Those users +1ed this
design so let's implement/release it".  There might be better ideas we
won't think of if we don't discuss things here on this list; the users
that talk to us are unlikely to be representative of all our users
anyway; and it's us, not them, who'll be promising to maintain that
design going forward.

Anyway, we can for starters call for testers on SVN-525 and on users@.

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
> >> [...] next a similar pattern applies to the "normal" part of the
> >> update (everything it does after "restore"). Obviously we need the
> >> normal part of update
> >
> >Yes, but for the "deltas" part of update we already mostly DTRT, don't we?
> >
> >- If the file is not modified, [...]
> >
> >- If the file is locally modified, then by design, we need to end up
> >  with a pristine for it.  Right now we'll download BASE, and then
> >  [...]  What am I missing?
> 
> You're missing the case where the file is locally modified, and is in
> the tree scope of the update request, but no update is found in the
> repo. Currently we download its base before executing the business
> logic of update, so before we know that we're not going to need the
> base to complete this update request.

But that's exactly the branch's invariant, isn't it?

[[[
The core idea is that we start to maintain the following
invariant: only the modified files have their pristine text-base
files available on the disk.
]]]

So, if the file is locally modified, and we download its base, we cause
the file to meet the invariant.  I don't see how that's a problem,
unless we download a base we already have, or discard the base rather
than keep it.

I suspect I'm still missing something.

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-16 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 16, 2022 at 07:27:49 +:
> Daniel Shahaf wrote:
> >I'll also mention asciinema.  It's basically script(1) into a video
> >hosted online.  It might be instructive for us to watch an asciinema
> >session of someone trying this branch for the first time.  It's about as
> >near as we can get to standing behind their shoulder, without actually
> >sharing a machine with them and watching a shared tmux(1) session.
> 
> Good idea. Anyone willing to try it?

asciinema is packaged in various distros, so you should be able to
install it via your package manager.  See 
https://repology.org/project/asciinema/versions



Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-15 Thread Daniel Shahaf
Daniel Shahaf wrote on Wed, Mar 16, 2022 at 04:43:19 +:
> Julian Foad wrote on Mon, Mar 14, 2022 at 20:23:29 +:
> > Daniel Sahlberg wrote:
> > >[...] I will try to build a release for myself and use it for dev work.
> > Thank you Daniel.
> > 
> > I'm wondering if I (or we) need to do more to facilitate evaluation. I'm 
> > thinking of things like adding some feedback to tell the user what it's 
> > doing ("fetching missing pristines now..."), maybe at an extra verbose 
> > level during this evaluation phase to help users understand it; finding out 
> > if any of the outstanding issues need fixing in order to be able to use it 
> > productively; maybe getting binaries built and distributed if that helps; 
> > maybe we can supply more succinct user documentation than what I wrote so 
> > far?
> 
> I think what we need now is users willing and able to test this.  Once
> we do, we can figure out what we need to do in order to make it easier
> for them to test it, whether it's write docs, or add notifications, or
> build binaries, or…
> 
> For starters, ourselves.  Is HEAD of the branch good enough that devs
> with use-cases can start to try it in their real use-case wc's?  It
> won't be possible to downgrade f32 to f31, but if we want, say, to make
> pristines-on-demand toggleable within a format,

(That's SVN-4889.)

> we can implement that in
> f33 and leave f32 as "never appeared in a release".
> 
> I'll also mention asciinema.  It's basically script(1) into a video
> hosted online.  It might be instructive for us to watch an asciinema
> session of someone trying this branch for the first time.  It's about as
> near as we can get to standing behind their shoulder, without actually
> sharing a machine with them and watching a shared tmux(1) session.
> 
> Cheers,
> 
> Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-15 Thread Daniel Shahaf
Julian Foad wrote on Mon, Mar 14, 2022 at 20:23:29 +:
> Daniel Sahlberg wrote:
> >[...] I will try to build a release for myself and use it for dev work.
> Thank you Daniel.
> 
> I'm wondering if I (or we) need to do more to facilitate evaluation. I'm 
> thinking of things like adding some feedback to tell the user what it's doing 
> ("fetching missing pristines now..."), maybe at an extra verbose level during 
> this evaluation phase to help users understand it; finding out if any of the 
> outstanding issues need fixing in order to be able to use it productively; 
> maybe getting binaries built and distributed if that helps; maybe we can 
> supply more succinct user documentation than what I wrote so far?

I think what we need now is users willing and able to test this.  Once
we do, we can figure out what we need to do in order to make it easier
for them to test it, whether it's write docs, or add notifications, or
build binaries, or…

For starters, ourselves.  Is HEAD of the branch good enough that devs
with use-cases can start to try it in their real use-case wc's?  It
won't be possible to downgrade f32 to f31, but if we want, say, to make
pristines-on-demand toggleable within a format, we can implement that in
f33 and leave f32 as "never appeared in a release".

I'll also mention asciinema.  It's basically script(1) into a video
hosted online.  It might be instructive for us to watch an asciinema
session of someone trying this branch for the first time.  It's about as
near as we can get to standing behind their shoulder, without actually
sharing a machine with them and watching a shared tmux(1) session.

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-15 Thread Daniel Shahaf
Julian Foad wrote on Tue, Mar 15, 2022 at 20:10:24 +:
> Just an addendum, perhaps a more positive portrayal of the brief
> exploration of the alternative design approach: my assessment is that
> exploration was enough to show that an initial release based on the
> original approach has possibilities of being improved, incrementally, in
> that way, as and when resources permit.
> 
> In other words I am not recommending choosing one approach and
> abandoning the other, but starting with one and postponing the other as
> possible future improvement work.

Sorry, but could you spell out what are the "one approach" and "the
other"?  Are you proposing to release the code as it is, fetching in
advance, and saying you're confident it can in the future be taught to
fetch during the operation, notwithstanding kotkov@'s points about
RA-level timeouts?

Cheers,

Daniel


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-15 Thread Daniel Shahaf
Julian Foad wrote on Fri, Mar 11, 2022 at 19:36:41 +:
> Stick with the idea, for now, that we do need to handle that "restore"
> part of update.

Can we deprecate it?

In the API, create an svn_client_updateN() function that's documented to
be like svn_client_updateN-1() but without reverting absent files.  In
the CLI, create an «svn update2» command with the same caveat.  Tell
people who use pristines-on-demand to use «svn update2» rather than «svn
update».  Would this work?

> The alternative approach is to add a callback to the
> lower level "get the pristine" function that it uses. Then it would be
> able to fetch what it needs when it needs it and not fetch anything it
> doesn't need. Evgeny cautions us about that alternative approach, but
> *in principle* if we could get the protocol level behaviour absolutely
> right, that would (I think) surely be better.
> 
> That "restore" happens, within update, before the server tells us
> whether any update of that particular file is actually present on the
> repository. (Perhaps it could be moved to afterwards; I haven't
> investigated that possibility.)
> 
> Then, never mind whether we care about supporting that "restore" thing;
> because next a similar pattern applies to the "normal" part of the
> update (everything it does after "restore"). Obviously we need the
> normal part of update

Yes, but for the "deltas" part of update we already mostly DTRT, don't we?

- If the file is not modified, the WORKING file doubles as an on-demand
  BASE (it gets detranslated when BASE is called for) so hydrating it is
  a no-op.

- If the file is locally modified, then by design, we need to end up
  with a pristine for it.  Right now we'll download BASE, and then
  download a delta to the new BASE and do a 3-way merge.  Can we avoid
  downloading either the old BASE or the delta?
  
  There are three possible cases (cf. merge_file_trivial()):

  + the old and new BASEs are equal, in which case the delta download
and application is O(1), and the BASE download is fine because
a modified file is _supposed_ to have a pristine; or

  + WORKING and the new BASE and are equal, in which case we will
download stuff we don't need to (but this is an edge case); or

  + we need to run a three-way merge, which means we need the 
.rN .rM .mine all handy.  We have only one file, so we need the
server to send us the other two, and the question is just whether
the delta from .rN to .rM would be applied client-side or
server-side.  The .rN .rM .mine files might be merged by us with
a 'G' notification, or we might give up and throw a 'C' notification,
but we can't avoid downloading two of the three files — unless
someone knows an rsync-like way to do diff3 on three files when two
of them are not locally available.

Users can avoid this case by using svn:needs-lock on these large
files, to ensure the old and new BASEs will be equal.  Cf.

https://mail-archives.apache.org/mod_mbox/subversion-dev/202201.mbox/%3C20220131115758.GA14771%40tarpaulin.shahaf.local2%3E

So, it seems to me we're doing what we can except in the case of an
update that would make the new BASE identical to WORKING; and should
recommend that users consider svn:needs-lock.  What am I missing?

> > [...] From my laymans point of view, we have a database in
> > the WC that says what we have. I assumed we largely would be using
> > that information when talking to the server about what it needs to
> > send us to do an update.
> 
> Well, yes and no in the current two-phase (hydration, then operation) 
> approach.
> 
> > So I am just not getting why the server needs
> > to send us a file that WC already has.
> 
> It doesn't ever send us a file that the WC already has. The issue we're
> concerned about is it sending a pristine that is (knowingly) absent from
> the pristine store, but for a file whose pristine is not going to be
> looked at during the current update. It might be needed by some future
> operation but the current approach fetched it pre-emptively (by design)
> but for this use case we would rather not do that.
> 
> > I know your answer will be the "restore situation" [...]
> 
> That's not the essential part, no; the main part of "update" is
> obviously the essential part, and it has similar characteristics and
> options, and isn't optional.
> 
> - Julian
> 


Re: Issue #525/#4892: on only fetching the pristines we really need

2022-03-15 Thread Daniel Shahaf
Julian Foad wrote on Mon, Mar 14, 2022 at 10:47:57 +:
> I wonder if we are missing some perspective.
> 
> We are worried that the current design won't be acceptable because it
> has poor behaviour in a particular use case.
> 
> The use case involved running "svn update" at the root of the WC. (It
> didn't explicitly say that. More precisely, it implied the update target
> tree contains the huge locally modified file.)
> 
> Using this new feature necessarily requires some adjustments to user
> expectations and work flow.
> 
> What if we ask the user to limit their "svn update" to target the
> particular files/paths that they need to update, keeping their huge
> locally modified file out of its scope? Examples:
> 
> svn update readme.txt
> svn update small-docs/
> # BUT NOT: svn update the-whole-wc/
> 
> Then we side-step the issue. It only fetches pristines for modified
> files that are within the tree scope of the specified targets. (This is
> how it works already, not a proposal.)
> 
> OK that's not optimal but it might be sufficient.

This implies the wc won't be uniform revision.  This might break user
expectations; might prevent the user from running «svn merge»; and might
get in the way of querying the repository for the wc's history.  E.g.,
even a simple «svn diff -r BASE» on an unmodified wc will show
differences, because BASE is resolved to a revision number once at the
start, not once per versioned file.

One way in which I can see this possibly working is if the user is
willing to restructure their tree so all the large files are in one
subdirectory that's an immediate child of the wc root, and all other
files are in another immediate child directory of the wc root.  Then
they can use the latter child directory as their cwd and run svn
operations normally.  The cwd won't be the wc root, but that's
manageable.  However, requiring a tree restructure would increase the
cost to the user of starting to use this feature.

Also, at this point they can basically leave the large files in an
svn:external wc, and enable pristines-on-demand only for that wc, rather
than have one wc with two subdirs.

Cheers,

Daniel


> (Of course there are further concerns, such as what happens if the user
> starts an update at the WC root, then cancels it as it's taking too
> long: can we gracefully recover? Fine, we can look at those concerns.)
> 
> I can go ahead with further work on changing the design if required, but
> I am concerned that might not be the best use of resources. Also I don't
> know how to evaluate the balance of Evgeny's concerns about protocol
> level complexity of the alternative design, against the concerns about
> the present design. In other words pursuing that alternative seems
> riskier, while accepting the known down-sides of the current design is
> sub-optimal but seems less risky.
> 
> Should we first test the current design and see if we can work with it,
> before going full steam ahead into changing the design?
> 
> The current design/implementation (on branch
> 'pristines-on-demand-on-mwf') is in a working state. There are open
> issues that still need to be resolved, but it's complete enough to be
> ready for this level of testing.
> 
> - Julian
> 


Re: http URLs should be updated to https

2022-03-15 Thread Daniel Shahaf
Vincent Lefevre wrote on Fri, Mar 11, 2022 at 12:09:59 +0100:
> On 2022-03-11 10:29:12 +, Julian Foad wrote:
> > Julian Foad wrote:
> > > +1. Can you send a patch?
> > 
> > By the way, the reason I ask if you would be willing, rather than "just
> > quickly doing it" myself, is even a small "obvious" fix like this tends
> > to require more than it initially looks like: checking if it's already
> > done in head of trunk,
> 
> This is what I'm looking at.
> 
> > finding other similar places,
> 
> I typically do a recursive grep to find the potential updates to be
> done.
> 
> > deciding if any of them shouldn't be updated for whatever reason,
> > running the test suite, adjusting test results to match...
> 
> Yes, this requires some time to do it well (I've already done such
> kinds of changes in another project), mainly for checking. I suppose
> that eveything the end user can see needs to be updated, but also
> for the readers of the source code, e.g. comments like
> 
>   #  See http://subversion.apache.org for more information.

+1

> I have a question for 2 kinds of files. I think that the URLs should
> be updated too in these files, but I need confirmation:
>   * The "CHANGES" file.

Most instances can be changed, yes, but some instances of "http://; are
shorthand for "any http://* or https://*; URL and should probably be
left alone.  And naturally, http://third-party.example.com/ stuff
shouldn't be changed unless the new value is verified to work.

>   * The .po files. The advantage is that this would avoid the need of
> the update by the translators. A potential drawback is that the
> additional character might need to adjust the formatting, e.g. to
> fit on 80 columns where applicable (there aren't many concerned
> lines, so that I could report any such formatting issue).

+1 and thank you.

Instead of reporting them, it might be easier for you to just leave the
entire msgid/msgstr pair untouched.  We could then generate a list of
the remaining issues by grepping again, and the translator would notice
the out-of-date message when they next update the translation.

Cheers,

Daniel


Re: Windows build

2022-03-15 Thread Daniel Shahaf
Daniel Sahlberg wrote on Tue, Mar 15, 2022 at 16:09:51 +0100:
> Hi,
> 
> I'm once again restarting my efforts on a solid Windows build environment.
> I've been checking the scripts in build/win32 and believe these might be a
> reasonable starting point.
> 
> Does anyone have recent (or not so recent...) notes on what you did to set
> it up?

tools/dev/windows-build/

tools/dev/build-svn-deps-win.pl

https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/subversion.conf?p=1078449
(peg revision because the directory was deleted in the peg revision; I don't 
know the new home, if any)

Cheers,

Daniel


> If you feel the notes are not in shape to share publically, send them
> offlist and I can edit them.
> 
> Thanks,
> Daniel Sahlberg


multi-wc-format: release notes

2022-03-15 Thread Daniel Shahaf
Shall we write release notes for multi-wc-format now?  That
branch/feature seems merged and finished, other than the release notes
not being updated yet.  And it's better to write them now while it's
fresh in our memories, and so the notes will be available to anyone who
might try trunk@HEAD builds.

There's a start here:

https://subversion-staging.apache.org/docs/release-notes/1.15#wc-upgrade

Any volunteers?

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-15 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 17:59:20 -0600:
> There are reasonable arguments both ways for
> shipping MVP with/without x-hydrate functionality.
> 
> What do others think?

Just bumping Karl's question.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-15 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 17:59:20 -0600:
> On 08 Mar 2022, Daniel Shahaf wrote:
> > Sure.  I was asking whether by "once the user has a local pristine" you
> > meant a pristine — as in, a file under .svn/pristine/ that .svn/wc.db
> > knows about and uses — or Alice making a local copy of the contents of
> > file@BASE somewhere libsvn doesn't know about.
> 
> Well, depending on the context, I may be using the word "pristine" flexibly.
> Sometimes I mean a literal integrated-into-wc-metadata pristine, and
> sometimes I just mean "an extra copy of the file, that the user has made
> locally".
> 

I see.

> (It's possible that the degree of precision you would like in this
> sub-discussion is not one I'm willing to adhere to consistently :-).  I
> can't always predict what will matter to a given interlocutor.  But I'll try
> to be sufficiently precise in my responses below at least.)

Thanks, Karl.  I hope I'm not frustrating you.  I do try to be
interoperable with as many interlocutors as possible, but using "foo" to
sometimes mean "bar" and sometimes mean "poor man's alternative to bar"
does in fact create ambiguities.

> > A manual copy of the BASE revision would "serve for local diffs and
> > reverts", indeed, but I would hestitate to recommend this, because diff
> > and revert are both core operations.  If users need to reinvent these
> > two wheels, then:
> > 
> > - All the advantages of having just that one well-known «svn revert»
> >  button that all the users' GUI clients and scripts can press  are lost
> > 
> > - The local disk storage cost will be paid, but without all the
> >  benefits: e.g., commit will use a self-delta rather than a  delta
> >  against BASE even if the file format does lend itself to binary  diffs;
> >  ra_serf's ability to not download a file if the wc has another  file
> >  with the same sha1 won't be used; the keyword-contraction and
> >  diff-ignore-content-type features of «svn diff» will need to be
> >  reimplemented; etc.
> > 
> > - We might leave a bad impression on potential users
> > 
> > As an MVP alternative, some sort of command to hydrate a single file,
> > perhaps, as you have proposed?  CLI-wise, I'll just say we might want to
> > mark such a command as experimental (name it "x-foo" and document it has
> > reduced forward compatibility promises).  Backend-wise, we'll want to
> > ensure a manually-hydrated file doesn't get dehydrated too soon.
> > 
> > What's "too soon"?  Until the user explicitly requests or permits
> > dehydration.  If hydration was manual, so should dehydration be.
> > 
> > Makes sense?
> 
> Yes, thanks for the suggestion, and I agree.  I would love for MVP or MVP+1
> to have an explicit "rehydrate" UI.  I think there *might* be some value to
> shipping MVP without such a feature, in order to first get some real-world
> experience with how people use pristine-less working copies, before we make
> long-lasting UI decisions.
> 
> But anyway, +1 to the general idea.

Filed: https://issues.apache.org/jira/browse/SVN-4894

> > The context of all this is whether 'update' should fetch pristines for
> > modified files.  I guess it should not do so by default (there's no
> > reason to incur the costs, and the user has opted in to
> > pristines-on-demand),
> > but I don't think we should tell users to keep pristines _and not tell
> > libsvn_wc about them_.  The cost of implementing «svn x-hydrate»
> > (however named) is smaller than the cost of asking users to reimplement
> > core version control functionality.
> 
> Users can already copy files behind Subversion's back, of course.
> 
> I'm worried that implementing 'svn x-hydrate' commands now would be
> premature -- we don't know enough about real-world usage yet. I'd feel more
> comfortable putting out one release (of x-hydrate-less MVP) to get feedback
> on pristine-less working copies.  We could even say that we're considering
> adding x-hydrate commands but that we're waiting until the next release so
> we can make sure our UI ideas match people's actual needs.
> 
> Anyone else have thoughts on this?
> 

Just to make sure you noticed I'm proposing this as an x-* command,
i.e., without promising it'll behave in 1.16 as it does in 1.15, or even
exist at all in 1.16?

We could write a Python script to explicitly hydrate something, even
after 1.15.0-GA, to let people experiment with that to some degree.  (It
won't preserves hydration through commits, of course.)

> > This way, by default «commit» will send self-deltas, but if the user
> > wants a pri

Re: multi-wc-format review

2022-03-15 Thread Daniel Shahaf
Julian Foad wrote on Wed, Mar 09, 2022 at 19:53:12 +:
> On Mar 8 2022, Daniel Shahaf wrote:
> >>   By default Subversion will upgrade the working copy to a version
> >>   compatible with Subversion 1.8 and newer.
> >  
> > Are we assuming that future minor versions (1.16, 1.17, etc.) will all
> > continue to be able to read/write f31?  This seems to be implied by
> > the language [...]
> 
> It seems desirable at this time, now that we have the mechanism to do
> so, that we would continue supporting that format for a good long while;
> but of course I would not promise indefinite support.
> 
> What I was thinking is we can use language like that for 1.15, and then
> in a future version if and when we drop support, we can at the same time
> change the language (and if necessary the format reporting APIs) to
> report more specific lists of support.
> 
> Specifically, I felt it was OK to use language like "and newer" in a
> given release of the tool, without being construed as a promise about
> versions newer than this version. If anyone thinks anything here could
> seriously mislead, point it out and let's change it.
> 

The existing language, "By default Subversion will create a WC format
compatible with Subversion 1.8 and newer", sounds to me like we're
promising this will be the default for the remainder of 1.x.

That might be just me… but on the other hand, it might actually take
less time to disambiguate this than to discuss this:

[[[
Index: subversion/svn/svn.c
===
--- subversion/svn/svn.c(revision 1898952)
+++ subversion/svn/svn.c(working copy)
@@ -514,9 +514,9 @@ svn_cl__cmd_table_main[] =
  "Check out a working copy from a repository.\n"
  "usage: checkout URL[@REV]... [PATH]\n"
  "\n"), N_(
- "  By default Subversion will create a WC format compatible with\n"
- "  Subversion 1.8 and newer. To create a different WC format,\n"
- "  use an option such as '--compatible-version=1.15'.\n"
+ "  The new working copy (WC) will be compatible with Subversion 1.8 and\n"
+ "  newer (this default may change in the future). To create a different\n"
+ "  WC format, use an option such as '--compatible-version=1.15'.\n"
  "  The versions available are the same as in the 'upgrade' command.\n"
  "  Use 'svn --version' to see the compatible versions supported.\n"
  "\n"), N_(
@@ -1915,8 +1915,8 @@ svn_cl__cmd_table_main[] =
  "Upgrade the metadata storage format for a working copy.\n"
  "usage: upgrade [WCPATH...]\n"
  "\n"), N_(
- "  By default Subversion will upgrade the working copy to a version\n"
- "  compatible with Subversion 1.8 and newer. To upgrade to a different\n"
+ "  The upgraded working copy will be compatible with Subversion 1.8 and\n"
+ "  newer (this default may change in the future. To upgrade to a 
different\n"
  "  version, use an option such as '--compatible-version=1.15'.\n"
  "  The versions available are the same as in the 'checkout' command.\n"
  "  Use 'svn --version' to see the compatible versions supported.\n"
]]]

WDYT?

> Daniel Shahaf wrote:
> > Julian Foad wrote on Thu, Mar 03, 2022 at 10:53:13 +:
> >> [...] it seems clear to me now that we need to expose
> >> [WC format numbers]. [...]
> >  
> > Would you elaborate? [...]
> > What are the awkward semnatics?  What are the inconsistencies?  What
> > questions would API users be able to answer for themselves if we hand
> > them format numbers, that they can't easily answer with trunk@HEAD?
> 
> Some examples below.
> 
> >> If we're going to have version numbers as the 'compatible version' UI
> >> option, perhaps we should eliminate these issues by requiring to
> >> specify the exact first-introduced version, MAJOR.MINOR format (with
> >> no .PATCH, no -TAG).
> >  
> > _Prima facie_ I would -0 this, because it should be possible to do
> > «/opt/foo/bin/svn upgrade --compatible-version=$(/opt/bar/bin/svn
> > --version --quiet)» even when bar is v1.9.
> 
> I suppose in that example the intention is "I want it to be compatible
> with version , without me having to learn what formats 
> supports." The current option parsing seems to have a design intent that
> reflects this usage. And it is a reasonable use case at first sight.
> 
> Suppose '/opt/v1.19/bin/svn --version' outputs:
> * WC format 31, compatible with Subversion v1.8 and newer
> * WC format 32, compatible with Subvers

Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 14:01:22 -0600:
> On 08 Mar 2022, Daniel Shahaf wrote:
> > Karl Fogel:
> > > Hmm, I don't see where I was assuming that the pristine would be
> > > needed exactly once, though.  Once the user has a local pristine
> > > (by whatever means),
> > 
> > To be clear, we're only talking about pristines that libsvn_wc knows
> > about, right?  As opposed to Alice running «svn cat iota@BASE» and
> > saving the output somewhere.
> 
> Hmm, I don't think I understand the question here.  Can you ask it with more
> details / context?

Sure.  I was asking whether by "once the user has a local pristine" you
meant a pristine — as in, a file under .svn/pristine/ that .svn/wc.db
knows about and uses — or Alice making a local copy of the contents of
file@BASE somewhere libsvn doesn't know about.

> > > if she wants to keep that local pristine after committing its
> > > corresponding working file, then she could do so or not do so,
> > > depending on
> > > whether she wants to continue paying the local storage cost for it.
> > 
> > How would Alice keep iota's pristine after committing iota?  «svn commit
> > iota» deletes iota's pristine.
> 
> Like I said, I wasn't going into UI details.

Sure.  Neither was I.

> But if Subversion wants to offer a way for commit to keep the
> post-commit pristine around (in circumstances where that file would
> otherwise be pristine-less), it can do so.  This wouldn't be for MVP,
> of course; I'm just saying it's a conceivable feature and maybe some
> day we'll offer it.

+1

> For now, the way Alice would keep an "informal pristine" would be simply
> manually copy the file.  That's not a pristine in the full sense of the
> word, but it will serve for local diffs and reverts of course.

A manual copy of the BASE revision would "serve for local diffs and
reverts", indeed, but I would hestitate to recommend this, because diff
and revert are both core operations.  If users need to reinvent these
two wheels, then:

- All the advantages of having just that one well-known «svn revert»
  button that all the users' GUI clients and scripts can press are lost

- The local disk storage cost will be paid, but without all the
  benefits: e.g., commit will use a self-delta rather than a delta
  against BASE even if the file format does lend itself to binary diffs;
  ra_serf's ability to not download a file if the wc has another file
  with the same sha1 won't be used; the keyword-contraction and
  diff-ignore-content-type features of «svn diff» will need to be
  reimplemented; etc.

- We might leave a bad impression on potential users

As an MVP alternative, some sort of command to hydrate a single file,
perhaps, as you have proposed?  CLI-wise, I'll just say we might want to
mark such a command as experimental (name it "x-foo" and document it has
reduced forward compatibility promises).  Backend-wise, we'll want to
ensure a manually-hydrated file doesn't get dehydrated too soon.

What's "too soon"?  Until the user explicitly requests or permits
dehydration.  If hydration was manual, so should dehydration be.

Makes sense?



The context of all this is whether 'update' should fetch pristines for
modified files.  I guess it should not do so by default (there's no
reason to incur the costs, and the user has opted in to pristines-on-demand),
but I don't think we should tell users to keep pristines _and not tell
libsvn_wc about them_.  The cost of implementing «svn x-hydrate»
(however named) is smaller than the cost of asking users to reimplement
core version control functionality.

If we think there are use-cases in which users will want to have
a pristine for a modified file, whether those use-cases involve «commit»
or «diff» or «revert» or whatever else, then that pristine shouldn't be
just the user's private copy of BASE; it should be a real pristine, live
in .svn/pristine/ and be known to wc.db, and used for all svn operations,
not just those the user has reimplemented.

This way, by default «commit» will send self-deltas, but if the user
wants a pristine for diffs or reverts, then reverts, diffs, and commits
will all use the pristine.  There shouldn't be any need for the user to
reimplement their own pristine store and their own diff and revert
operations.

And yes, commit might not want to use pristines this way, but that's
actually a separate feature request: a request to change the "When
committing a change to a pristineful file, send a delta against BASE or
a self-delta, whichever is smaller" logic, which IIRC works by computing
a delta against BASE and comparing its length to the repository-normal
filesize, to something that doesn't compute a delta against BASE in the
first place.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Tue, Mar 08, 2022 at 12:32:38 -0600:
> On 08 Mar 2022, Daniel Shahaf wrote:
> > Karl Fogel wrote on Mon, Mar 07, 2022 at 13:44:03 -0600:
> > > And in the absence of fancy cross-network common-prefix detection
> > > code that we're not going to write, this would just be
> > > cost-shifting anyway.  Whatever commit-time improvement one would
> > > gain from having the pristine locally would be offset by the extra
> > > time spent fetching the pristine to make that commit-time
> > > improvement possible.
> > 
> > What assumptions is this conclusion valid under?  It seems to this
> > conclusion assumes, at least, that the uplink and downlink bandwidths
> > are equal and that the pristine will be needed exactly once (i.e.,
> > a hydrate-commit-dehydrate sequence).
> 
> I was assuming up and down speeds are roughly the same, yes.
> 
> Hmm, I don't see where I was assuming that the pristine would be needed
> exactly once, though.  Once the user has a local pristine (by whatever
> means),

To be clear, we're only talking about pristines that libsvn_wc knows
about, right?  As opposed to Alice running «svn cat iota@BASE» and
saving the output somewhere.

> if she wants to keep that local pristine after committing its
> corresponding working file, then she could do so or not do so, depending on
> whether she wants to continue paying the local storage cost for it.

How would Alice keep iota's pristine after committing iota?  «svn commit
iota» deletes iota's pristine.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Daniel Sahlberg wrote on Tue, Mar 08, 2022 at 14:34:06 +0100:
> Den tis 8 mars 2022 kl 14:17 skrev Daniel Shahaf :
> 
> > An alternative is to require the user to let svn know before they're
> > starting to edit a file, so we can create a pristine off the on-disk
> > file.  This way we won't have pristineless modified files in the first
> > place.
> >
> 
> Not "require". It might be an interesting for some use-case to have "svn
> create-pristine-from-wc" as a manual step, but not adding this as part of
> the normal workflow. I have some wc's that might benefit from being
> pristine-less, but I'm not prepared to pay the extra cost (time-wise) of an
> svn:needs-locking-like step for every file I need to modify. I don't think
> this new command (or option) is MVP.

I wasn't proposing we require such a step.  I was merely saying that was
one of several possible solutions to the "How to commit a pristineless
file" question.  Here they are again:

1. Download the pristine and then send a regular delta
2. Send a self-delta
3. rsync the file
4. Avoid getting into this situation in the first place

I guess we'll be happy with (2) for the MVP.

Cheers,

Daniel


Re: A two-part vision for Subversion and large binary objects.

2022-03-08 Thread Daniel Shahaf
Karl Fogel wrote on Mon, Mar 07, 2022 at 13:44:03 -0600:
> On 07 Mar 2022, Mark Phippard wrote:
> > > I do understand the reasons why Evgeny thought pre-fetching
> > > pristines for modified files as part of an 'update' could be a
> > > good idea.
> > 
> > My recollection of the first version of this patch, commit needed the
> > pristine and so had to fetch it before the commit happened. This may
> > have been a reason it seemed like a good idea at the time for update
> > to get the pristine.
> 
> Ah, maybe so; I didn't realize that.
> 
> If that was the motivation, then there's even less reason for 'update' to
> fetch pristines for modified files.  Having the pristine is not only
> unnecessary for the commit, in most cases having the pristine is not even
> particularly *useful* to the commit.  These types of files tend to be
> non-diffable anyway (i.e., not even binary diffable), broadly speaking and
> with occasional exceptions of course.  For example, a common such file is a
> gigantic gzipped blob.  Tiny changes in the uncompressed text will lead to a
> completely different gzipped blob.

And «update» could send a self-compressed delta anyway.

> (I suppose it might be the case that if the first change is made very late
> in the uncompressed text, then the revised gzipped blob can, under some
> real-world circumstances, actually be bit-for-bit the same as the original
> for a long initial prefix before showing any difference.  But this is a rare
> enough case that I don't think Subversion should be trying to detect it and
> support it.  We'd essentially have to incorporate the rsync rolling-checksum
> algorithm, or something like it, into our diff negotiation to even get any
> advantage.)

This use-case may be a rare one, but rsync _was_ in fact designed to
solve precisely the problem that «svn commit» of a pristineless file
needs to solve.  So, suppose we did use the rsync algorithm, would this
benefit any other use-cases other than the "first change is at the end
of the file" use-case you describe here?  Is it faster to commit a file
by sending a self-delta of it or by rsync'ing it?

Furthermore, the user may be able to deliberately create the huge file
in a way that makes it rsync-friendly: for instance, `svnadmin dump`
emits hashes in sorted order, which has the side-effect of making dump
files rsync-friendly.  For gzip files there is «gzip --rsyncable».

None of this is needed for the MVP, of course, but I do think the basic
principle of using rsync is in fact sound.

An alternative is to require the user to let svn know before they're
starting to edit a file, so we can create a pristine off the on-disk
file.  This way we won't have pristineless modified files in the first
place.

> And in the absence of fancy cross-network common-prefix detection code that
> we're not going to write, this would just be cost-shifting anyway.  Whatever
> commit-time improvement one would gain from having the pristine locally
> would be offset by the extra time spent fetching the pristine to make that
> commit-time improvement possible.

What assumptions is this conclusion valid under?  It seems to this
conclusion assumes, at least, that the uplink and downlink bandwidths
are equal and that the pristine will be needed exactly once (i.e.,
a hydrate-commit-dehydrate sequence).

I'm not objecting to making assumptions; we aren't going to address all
use-cases in 1.15.  I'm just asking that we make our assumptions explicit.

Cheers,

Daniel

> So... yeah.  Let's not do that :-).
> 
> Best regards,
> -Karl


  1   2   3   4   5   6   7   8   9   10   >