Re: Getting to first release of pristines-on-demand feature (#525).

2024-01-18 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> Merged in https://svn.apache.org/r1905955
>
> I'm going to respond on the topic of SHA1 a bit later.

For the history: thread [1] proposes the `pristine-checksum-salt` branch that
adds the infrastructure to support new pristine checksum kinds in the working
copy and makes a switch to the dynamically-salted SHA1.

>From the technical standpoint, I think that it would be better to release
the first version of the pristines-on-demand feature having this branch
merged, because now we rely on the checksum comparison to determine if a
file has changed — and currently it's a checksum kind with known collisions.

At the same time, having that branch merged probably isn't a formal release
blocker for the pristines-on-demand feature.  Also, considering that the
`pristine-checksum-salt` branch is currently vetoed by danielsh (presumably,
for an indefinite period of time), I'd like to note that personally I have
no objections to proceeding with a release of the pristines-on-demand
feature without this branch.

[1] https://lists.apache.org/thread/xmd7x6bx2mrrbw7k5jr1tdmhhrlr9ljc


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Tue, Dec 20, 2022 at 11:14:00 +0300:
> [Moving discussion to a new thread]
> 
> We currently have a problem that a working copy relies on the checksum type
> with known collisions (SHA1).  A solution to that problem

Why is libsvn_wc's use of SHA-1 a problem?  What's the scenario wherein
Subversion will behave differently than it should?

> is to switch to a different checksum type without known collisions in
> one of the newer working copy formats.

Such as SHA-1 salted by NODES.LOCAL_RELPATH and NODES.WC_ID (or a per-wc UUID)?

> Since we plan on shipping a new working copy format in 1.15, this seems to
> be an appropriate moment of time to decide whether we'd also want to switch
> to a checksum type without known collisions in that new format.
> 

What's the acceptance test we use for candidate checksum algorithms?

You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that?  Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
   nothing else, return the input.
2. Else, return the SHA-1 of the input.

This algorithm has a trivial first preimage attack.  If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?

> Below are the arguments for including a switch to a different checksum type
> in the working copy format for 1.15:
> 
> 1) Since the "is the file modified?" check now compares checksums, leaving
>everything as-is may be considered a regression, because it would
>introduce additional cases where a working copy currently relies on
>comparing checksums with known collisions.
> 

Well, SHA-1 is still collision-free so long as one is not deliberately
trying to use collisions, so this would only be a regression if we
consider "Deliberately store files that have the same checksum" to be
a use-case.  Do we?

I recall we discussed this when shattered.io was announced, and we
didn't rush to upgrade the checksums we use everywhere, so I guess back
then we came to the conclusion that wasn't a use-case.  (Of course we
can change our opinion; that's just a datapoint, and there may be more,
on both sides, in the old thread.)

I looked for the old thread and didn't find it.  (I looked in the
private@ archives too in case the thread was there.)

> 2) We already need a working copy format bump for the pristines-on-demand
>feature.  So using that format bump to solve the SHA1 issue might reduce
>the overall number of required bumps for users (assuming that we'll still
>need to switch from SHA1 at some point later).
> 

Considering that 1.15 will support reading and writing both f31 and f32,
the "overall number of required bumps" between 1.8 and trunk@HEAD is
zero, meaning the proposed change can't reduce that number.

> 3) While the pristines-on-demand feature is not released, upgrading
>with a switch to the new checksum type seems to be possible without
>requiring a network fetch.

I infer the scenario in question here is upgrading a (say) pristinesless
wc to a a newer format that supports a new checksum algorithm.

>But if some of the pristines are optional, we lose the possibility
>to rehash all contents in place.  So we might find ourselves having
>to choose between two worse alternatives of either requiring
>a network fetch during upgrade or entirely prohibiting an upgrade
>of working copies with optional pristines.

Why would we want to rehash everything in place?  The 1.15→1.16 upgrade
could simply leave pristineless files' checksums as SHA-1 until the next
«svn up», just like «svnadmin upgrade» of FSFS doesn't retroactively add
SHA-1 checksums to node-rev headers or "-file" or "-dir" indicators in
the changed-paths section.

There may be yet other alternatives.

> Thoughts?

I'm not voting either -0 or +0 at this time.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Branko Čibej

On 20.12.2022 09:14, Evgeny Kotkov wrote:

2) We already need a working copy format bump for the pristines-on-demand
feature.  So using that format bump to solve the SHA1 issue might reduce
the overall number of required bumps for users (assuming that we'll still
need to switch from SHA1 at some point later).


Using a new hashing algorithm in the working copy is relatively simple. 
Making such a change backwards-compatible is not. It would be really 
nice if this could be done in a way that allows newer clients to still 
support older working copies without upgrading them; after all, we have 
the infrastructure for this in place now.


-- Brane

Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> > While here, I would like to raise a topic of incorporating a switch from
> > SHA1 to a different checksum type (without known collisions) for the new
> > working copy format.  This topic is relevant to the pristines-on-demand
> > branch, because the new "is the file modified?" check relies on the
> > checksum comparison, instead of comparing the contents of working and
> > pristine files.
> >
> > And so while I consider it to be out of the scope of the pristines-on-
> > demand branch, I think that we might want to evaluate if this is something
> > that should be a part of the next release.
>
> Good point.  Maybe worth a new thread?

[Moving discussion to a new thread]

We currently have a problem that a working copy relies on the checksum type
with known collisions (SHA1).  A solution to that problem is to switch to a
different checksum type without known collisions in one of the newer working
copy formats.

Since we plan on shipping a new working copy format in 1.15, this seems to
be an appropriate moment of time to decide whether we'd also want to switch
to a checksum type without known collisions in that new format.

Below are the arguments for including a switch to a different checksum type
in the working copy format for 1.15:

1) Since the "is the file modified?" check now compares checksums, leaving
   everything as-is may be considered a regression, because it would
   introduce additional cases where a working copy currently relies on
   comparing checksums with known collisions.

2) We already need a working copy format bump for the pristines-on-demand
   feature.  So using that format bump to solve the SHA1 issue might reduce
   the overall number of required bumps for users (assuming that we'll still
   need to switch from SHA1 at some point later).

3) While the pristines-on-demand feature is not released, upgrading with a
   switch to the new checksum type seems to be possible without requiring a
   network fetch.  But if some of the pristines are optional, we lose the
   possibility to rehash all contents in place.  So we might find ourselves
   having to choose between two worse alternatives of either requiring a
   network fetch during upgrade or entirely prohibiting an upgrade of
   working copies with optional pristines.

Thoughts?


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-13 Thread Karl Fogel

On 13 Dec 2022, Evgeny Kotkov wrote:

Evgeny Kotkov  writes:
Merged in https://svn.apache.org/r1905955


W00t!!  Thank you, and Julian and Daniel and everyone who's 
contributed to this.


So... do we have a release manager?  :-)


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-13 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> I think that the `pristines-on-demand-on-mwf` branch is now ready for a
> merge to trunk.  I could do that, assuming there are no objections.

Merged in https://svn.apache.org/r1905955

I'm going to respond on the topic of SHA1 a bit later.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-10 Thread Daniel Shahaf
Nathan Hartman wrote on Wed, Dec 07, 2022 at 20:29:11 -0500:
> On Wed, Dec 7, 2022 at 12:11 PM Evgeny Kotkov via dev <
> dev@subversion.apache.org> wrote:
> 
> >
> > I think that the `pristines-on-demand-on-mwf` branch is now ready for a
> > merge to trunk.  I could do that, assuming there are no objections.
> 
> 
> 
> I'd like to echo what others have already said by saying a great big THANK
> YOU, to all who have worked on this cool new feature so far!
> 
> I used an earlier incarnation of this branch some months ago in real usage
> scenarios with good results and looking at the recent commit emails as
> they've happened everything looks sensible to me.
> 
> I will try to run the full test suite in the next couple of days and
> assuming the tests pass for me I'll use it as my daily driver to test the
> real usage. Obviously I'll post here if I find anything...
> 
> Meanwhile I'd like to say that on further thought and after reading Johan's
> and Karl's feedback regarding the feature switch naming, I've come around
> to the point of view that --store-pristine={yes|no} is a perfectly fine UI.
> 

Well, if we're bikeshedding anyway, how about 
--backend-tweaks=without-pristines?
We can support just two values for starters ("without pristines" and
"with pristines"), and have the room to extend this in 1.16, similar to
--trust-server-cert/--trust-server-cert-failures and
--pre-1.4-compatible/--compatible-version.

Similarly, a new config file section with one valid option might make
sense if we anticipate adding more options to that section in the
future.  This way we avoid having the configuration split across two
places.

> Given that this is now the command line switch name, and since users are
> given direct control over the pristinefulness of a WC, and we've been
> calling this feature Pristines On Demand since its inception, I think we
> should finally bless this as the official name of the feature.
> 
> In the next couple of days I plan to update the staged 1.15 release notes,
> which until now tentatively called it Bare Working Copies, to call it
> Pristines On Demand and to complete the description there.
> 
> Regarding the SHA hash question:
> 
> While here, I would like to raise a topic of incorporating a switch from
> > SHA1 to a different checksum type (without known collisions) for the new
> > working copy format.  This topic is relevant to the pristines-on-demand
> > branch, because the new "is the file modified?" check relies on the
> > checksum
> > comparison, instead of comparing the contents of working and pristine
> > files.
> >
> > And so while I consider it to be out of the scope of the
> > pristines-on-demand
> > branch, I think that we might want to evaluate if this is something that
> > should be a part of the next release.
> 
> 
> Is it feasible and would it be beneficial to somehow decouple the hash code
> type from the wc format version? Asking because IIRC the need for a format
> bump to change hashes was one of the reasons it wasn't done a few years ago.

Maybe if we teach f32 to read /two/ new checksum kinds?  E.g., if we
teach f32 to read both SHA-512 and SHA-3, then even if 1.15 f32 writes
SHA-512 by default, it will nevertheless be able to read f32 wc's with
SHA-3 rows that 1.16 might create.

svn_checksum_kind_t's possible values include svn_checksum_fnv1a_32, so
I guess we already support reading wc.db's that use FNV-1a checksums?
(Incidentally, f31 is new in 1.8 whereas svn_checksum_fnv1a_32 is new
in 1.9.)

Cheers,

Daniel


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-07 Thread Nathan Hartman
On Wed, Dec 7, 2022 at 12:11 PM Evgeny Kotkov via dev <
dev@subversion.apache.org> wrote:

>
> I think that the `pristines-on-demand-on-mwf` branch is now ready for a
> merge to trunk.  I could do that, assuming there are no objections.



I'd like to echo what others have already said by saying a great big THANK
YOU, to all who have worked on this cool new feature so far!

I used an earlier incarnation of this branch some months ago in real usage
scenarios with good results and looking at the recent commit emails as
they've happened everything looks sensible to me.

I will try to run the full test suite in the next couple of days and
assuming the tests pass for me I'll use it as my daily driver to test the
real usage. Obviously I'll post here if I find anything...

Meanwhile I'd like to say that on further thought and after reading Johan's
and Karl's feedback regarding the feature switch naming, I've come around
to the point of view that --store-pristine={yes|no} is a perfectly fine UI.

Given that this is now the command line switch name, and since users are
given direct control over the pristinefulness of a WC, and we've been
calling this feature Pristines On Demand since its inception, I think we
should finally bless this as the official name of the feature.

In the next couple of days I plan to update the staged 1.15 release notes,
which until now tentatively called it Bare Working Copies, to call it
Pristines On Demand and to complete the description there.

Regarding the SHA hash question:

While here, I would like to raise a topic of incorporating a switch from
> SHA1 to a different checksum type (without known collisions) for the new
> working copy format.  This topic is relevant to the pristines-on-demand
> branch, because the new "is the file modified?" check relies on the
> checksum
> comparison, instead of comparing the contents of working and pristine
> files.
>
> And so while I consider it to be out of the scope of the
> pristines-on-demand
> branch, I think that we might want to evaluate if this is something that
> should be a part of the next release.


Is it feasible and would it be beneficial to somehow decouple the hash code
type from the wc format version? Asking because IIRC the need for a format
bump to change hashes was one of the reasons it wasn't done a few years ago.

Cheers,
Nathan


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-07 Thread Karl Fogel

On 07 Dec 2022, Evgeny Kotkov wrote:
The branch passes all tests in my Windows and Linux environments, 
in both

--store-pristine=yes and =no modes.


FYI, it passes all tests here too (on Debian GNU/Linux, up-to-date 
'testing' distro).  Attached file has details; there were some 
XFAILs, but no FAILs.


Best regards,
-Karl

$ svn info | grep -E "^URL: "
URL: 
https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf
$ svn status
?   subversion/tests/libsvn_subr/task-test
$ time make check
[001/127] 
auth-test...success
[002/127] 
authz-test..success
[003/127] 
bit-array-test..success
[004/127] 
cache-test..success
[005/127] 
changes-testsuccess
[006/127] 
checksum-test...success
[007/127] 
client-test.success
[008/127] 
compat-test.success
[009/127] 
compress-test...success
[010/127] 
config-test.success
[011/127] 
conflict-data-test..success
[012/127] 
conflicts-test..success
[013/127] 
crypto-test.success
[014/127] 
db-test.success
[015/127] 
diff-diff3-test.success
[016/127] 
dirent_uri-test.success
[017/127] 
dump-load-test..success
[018/127] 
entries-compat-test.success
[019/127] 
error-code-test.success
[020/127] 
error-test..success
[021/127] 
filesize-test...success
[022/127] 
fs-base-testsuccess
[023/127] 
fs-fs-pack-test.success
[024/127] 
fs-fs-private-test..

Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-07 Thread Karl Fogel

On 07 Dec 2022, Evgeny Kotkov wrote:

Evgeny Kotkov  writes:
I think that the `pristines-on-demand-on-mwf` branch is now ready 
for a
merge to trunk.  I could do that, assuming there are no 
objections.


+1, and thank you.  

Now, I haven't had time to do a real code review -- my manager hat 
gets tighter every year -- so my "+1" is mainly a sign of 
enthusiasm for the feature, and of general trust in our test suite 
and in everyone who has worked on this.



 https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf

The branch includes the following:
– Core implementation of the new mode where required pristines 
are fetched

 at the beginning of the operation.
– A new --store-pristine=yes/no option for `svn checkout` that is 
persisted

 as a working copy setting.


+1 to this UI.  We can offer other gateways to this feature later, 
but this is a clean & simple way to start out.


– An update for `svn info` to display the value of this new 
setting.


Yay.


– A standalone test harness that tests main operations in both
 --store-pristine modes and gets executed on every test run.
– A new --store-pristine=yes/no option for the test suite that 
forces all

 tests to run with a specific pristine mode.


Very nice. 

The branch passes all tests in my Windows and Linux environments, 
in both

--store-pristine=yes and =no modes.


W00t!

While here, I would like to raise a topic of incorporating a 
switch from
SHA1 to a different checksum type (without known collisions) for 
the new
working copy format.  This topic is relevant to the 
pristines-on-demand
branch, because the new "is the file modified?" check relies on 
the checksum
comparison, instead of comparing the contents of working and 
pristine files.


And so while I consider it to be out of the scope of the 
pristines-on-demand
branch, I think that we might want to evaluate if this is 
something that

should be a part of the next release.


Good point.  Maybe worth a new thread?

Best regards,
-Karl


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-07 Thread Daniel Sahlberg
Evgeny,

Thanks so much for your hard work in pushing this project forward!

I don't think I can contribute much in getting this merged to trunk (from
lack of C experience and lack of time to dig into the inner workings), but
I hope it can be completed!

Kind regards,
Daniel Sahlberg


Den ons 7 dec. 2022 kl 18:10 skrev Evgeny Kotkov via dev <
dev@subversion.apache.org>:

> Evgeny Kotkov  writes:
>
> > > IMHO, once the tests are ready, we could merge it and release
> > > it to the world.
> >
> > Apart from the required test changes, there are some technical
> > TODOs that remain from the initial patch and should be resolved.
> > I'll try to handle them as well.
>
> I think that the `pristines-on-demand-on-mwf` branch is now ready for a
> merge to trunk.  I could do that, assuming there are no objections.
>
>
> https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf
>
> The branch includes the following:
> – Core implementation of the new mode where required pristines are fetched
>   at the beginning of the operation.
> – A new --store-pristine=yes/no option for `svn checkout` that is persisted
>   as a working copy setting.
> – An update for `svn info` to display the value of this new setting.
> – A standalone test harness that tests main operations in both
>   --store-pristine modes and gets executed on every test run.
> – A new --store-pristine=yes/no option for the test suite that forces all
>   tests to run with a specific pristine mode.
>
> The branch passes all tests in my Windows and Linux environments, in both
> --store-pristine=yes and =no modes.
>
>
> While here, I would like to raise a topic of incorporating a switch from
> SHA1 to a different checksum type (without known collisions) for the new
> working copy format.  This topic is relevant to the pristines-on-demand
> branch, because the new "is the file modified?" check relies on the
> checksum
> comparison, instead of comparing the contents of working and pristine
> files.
>
> And so while I consider it to be out of the scope of the
> pristines-on-demand
> branch, I think that we might want to evaluate if this is something that
> should be a part of the next release.
>
>
> Thanks,
> Evgeny Kotkov
>


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-07 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> > IMHO, once the tests are ready, we could merge it and release
> > it to the world.
>
> Apart from the required test changes, there are some technical
> TODOs that remain from the initial patch and should be resolved.
> I'll try to handle them as well.

I think that the `pristines-on-demand-on-mwf` branch is now ready for a
merge to trunk.  I could do that, assuming there are no objections.

  
https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf

The branch includes the following:
– Core implementation of the new mode where required pristines are fetched
  at the beginning of the operation.
– A new --store-pristine=yes/no option for `svn checkout` that is persisted
  as a working copy setting.
– An update for `svn info` to display the value of this new setting.
– A standalone test harness that tests main operations in both
  --store-pristine modes and gets executed on every test run.
– A new --store-pristine=yes/no option for the test suite that forces all
  tests to run with a specific pristine mode.

The branch passes all tests in my Windows and Linux environments, in both
--store-pristine=yes and =no modes.


While here, I would like to raise a topic of incorporating a switch from
SHA1 to a different checksum type (without known collisions) for the new
working copy format.  This topic is relevant to the pristines-on-demand
branch, because the new "is the file modified?" check relies on the checksum
comparison, instead of comparing the contents of working and pristine files.

And so while I consider it to be out of the scope of the pristines-on-demand
branch, I think that we might want to evaluate if this is something that
should be a part of the next release.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-12-01 Thread Karl Fogel

On 29 Nov 2022, Johan Corveleyn wrote:
My thanks also to the courageous people having developed this, 
and the

gentle souls keeping the ball rolling :-).

About the name:


[...]


FWIW, my vote still goes to --store-pristines={yes|no}


Same here, FWIW.

I understand the argument that this exposes an "implementation 
detail" that the user is supposed to not need to think about.  But 
remember, the reason we developed this feature is because the user 
was *already* exposed to the existence of pristines: disk space 
usage by pristines is quite visible to the user -- that's the 
whole problem :-).


So only users who already "see" pristines -- that is, who are 
already aware of the storage issue -- would go looking for this 
feature in the first place.  So by the time they learn about the 
'--store-pristines' option, they're already being forced to deal 
with pristines as a concept, and the only question is whether the 
tool we give them to solve their problem will take advantage of 
that conceptual familiarity.


So, +1 to "--store-pristines=foo".

I prefer such an explicit option here, rather than vague ones 
that
could cover many different things. Also, --optimize=X can easily 
be

interpreted inversely as intended (for instance: when I have an
optimal network, do I use --optimize=network?)

Apart from {yes|no} the feature might grow other option values in 
the
future ('size-based' or 'text-only', or maybe simply 'auto' if we 
come
up with a good general strategy that works for 99% of the cases, 
the
details of which we don't want to burden our users with). We 
could

even, in some distant future, allow user-defined names that are
specified in ~/.subversion/config by the user (using some syntax 
where
the user can set configurable size limits or mime-types or 
whatever).


I also agree with Johan's point here.


One other suggestion: not a blocker of course, but a
runtime-config-area default would be nice :-). Users might want 
to
choose the same option all the time, without having to remember 
to add

the option to their checkout command.

Something like, in ~/.suversion/config

store-pristines-default={yes|no}


Later on, this might grow into more sophisticated local run-time 
config regarding pristines, but for now, providing this basic 
yes/no default is a good idea.  For example, on machines where one 
is regularly checking out trees with huge files, one might set the 
default to "no".


Best regards,
-Karl


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-29 Thread Johan Corveleyn
My thanks also to the courageous people having developed this, and the
gentle souls keeping the ball rolling :-).

About the name:

On Thu, Nov 24, 2022 at 3:57 PM Nathan Hartman  wrote:
...
> Previously we got stuck trying to choose the user-facing name of this
> feature and its command line switches.
>
> Currently the CLI switch is --store-pristine={yes|no}.
>
> I'm okay with this, but for completeness I'll mention that earlier in
> the year there was a little bit of push back because pristines, up
> until now, have been an internal implementation detail that users
> needn't concern themselves with. (Except that they double the storage
> space...)
>
> I've been trying to think of something better for months now, and
> here's what I've come up with:
>
> --optimize=storage
> --optimize=network

FWIW, my vote still goes to --store-pristines={yes|no}

I prefer such an explicit option here, rather than vague ones that
could cover many different things. Also, --optimize=X can easily be
interpreted inversely as intended (for instance: when I have an
optimal network, do I use --optimize=network?)

Apart from {yes|no} the feature might grow other option values in the
future ('size-based' or 'text-only', or maybe simply 'auto' if we come
up with a good general strategy that works for 99% of the cases, the
details of which we don't want to burden our users with). We could
even, in some distant future, allow user-defined names that are
specified in ~/.subversion/config by the user (using some syntax where
the user can set configurable size limits or mime-types or whatever).


One other suggestion: not a blocker of course, but a
runtime-config-area default would be nice :-). Users might want to
choose the same option all the time, without having to remember to add
the option to their checkout command.

Something like, in ~/.suversion/config

store-pristines-default={yes|no}

Just my 2 cents of course ...
-- 
Johan


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-24 Thread Nathan Hartman
On Wed, Nov 23, 2022 at 9:53 AM Julian Foad  wrote:
> Nathan, I see you replied enthusiastically and mentioned "I have much to
> say on both of these [TODOs] but I won't go into detail yet...". It
> seems to me it could be helpful to get that started sooner rather than
> later, too, if those issues still need hashing out.


Thanks for the nudge.

Previously we got stuck trying to choose the user-facing name of this
feature and its command line switches.

Currently the CLI switch is --store-pristine={yes|no}.

I'm okay with this, but for completeness I'll mention that earlier in
the year there was a little bit of push back because pristines, up
until now, have been an internal implementation detail that users
needn't concern themselves with. (Except that they double the storage
space...)

I've been trying to think of something better for months now, and
here's what I've come up with:

--optimize=storage
--optimize=network

Rationale:

* Self-documenting.

* Easy to explain: --optimize=storage saves storage space;
  --optimize=network reduces network accesses to the repository
  server.

* Users don't need to know about pristines. There aren't several levels
  of abstraction between the option name and why the user cares about
  it.

* Extensible. Maybe we can think of other ways to optimize for network
  bandwidth, for example.

The docs can give more user-facing explanation, including tradeoffs,
which SVN operations are affected, and example scenarios to help users
choose. It should be much easier to write -- and read -- than what we
currently have at the draft release notes [1].

As for example scenarios, while the original premise was to save space
on large files that don't change often, i525pod is also great in other
situations, such as checking out a large source tree on a ramdrive
(limited space), or on the same machine as the repo, or on a storage-
limited embedded device. (I've tried i525pod in all 3 of these
scenarios!)

Downsides:

* Admittedly, --optimize=network isn't the best name in all scenarios.
  Notably, this is a misnomer when the repository server is on the same
  machine as the working copy, but that might not matter because it's
  the default. (And I might suggest trying --optimize=storage in that
  scenario).

* If we ever want to do other cool things with pristines, such as an
  option to keep more locally cached history, these names won't be
  right for that.

* These option names haven't helped me come up with a better name for
  the feature itself.

There is an advantage to using --store-pristine={yes|no}: We don't need
to rename the feature because Pristines On Demand and the CLI options
are named similarly.

The disadvantage of --store-pristine={yes|no} is that the feature is
more burdensome for us to explain and for others to learn about,
especially from a non-technical standpoint. How would you explain this
feature in a press release, or in a short blurb (or dare I say, tweet)
about "What's new in Subversion 1.15?"

Some other possibilities that were discussed:

I'll mention these for completeness but note that if --optimize=x is
shot down, I'd rather use --store-pristine={yes|no} than any of these:

* Hydrate and dehydrate -- perhaps the terms that appear most in dev
  discussions. I don't recommend these in user-facing areas because
  they aren't self-documenting. Users can't deduce what these actually
  do for the user. Users might mistakenly think that their working
  files would be hydrated or dehydrated in some way. Users would have
  to learn about pristines to know what is being hydrated or
  dehydrated, eliminating any useful abstraction.

* "Bare working copies" -- the draft release notes [1] use this term
  tentatively to explain that "bare" working copies save storage by not
  caching "BASE" files. Unfortunately, "bare" and "BASE" differ by only
  one letter (and capitalization) and I feel like the explanation is
  too complicated and doesn't bring us closer to a good result.

* Briefly discussed: "local BASE" or "remote BASE" -- but that's a
  misnomer because there's no such thing as "remote" BASE.

Well, you've been warned that I have much to say. :-)

Cheers,
Nathan


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-23 Thread Julian Foad
I'm glad to see you all picking up this project again. While working on
this at the beginning of the year I turned on the pristines-on-demand
mode in some of my own WCs such as my 'Documents' tree which includes
lots of scanned paper docs. It works nicely for cases like this, and
feels right, the pristine store being mostly unpopulated when the
working files are mostly unchanging.

I meant to check back with you during the year, how we should take it
forward. The recent summary in this thread sounds about right. My own
capacity to contribute is steadily decreasing. So, thank you, dev
community: it's good to see people working together to make it happen.
It would be pleasing to see this being brought to a satisfactory state
and released.

Nathan, I see you replied enthusiastically and mentioned "I have much to
say on both of these [TODOs] but I won't go into detail yet...". It
seems to me it could be helpful to get that started sooner rather than
later, too, if those issues still need hashing out.

- Julian



Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-16 Thread Karl Fogel

On 16 Nov 2022, Evgeny Kotkov wrote:

Apart from the required test changes, there are some technical
TODOs that remain from the initial patch and should be resolved.
I'll try to handle them as well.


Thank you!


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-16 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Thank you, Evgeny!  Just to make sure I understand correctly --
> the status now on the 'pristines-on-demand-on-mwf' branch is:
>
> 1) One can do 'svn checkout --store-pristines=no' to get an
> entirely pristine-less working copy.  In that working copy,
> individual files will be hydrated/dehydrated automagically on an
> as-needed basis.
>
> 2) There is no command to hydrate or dehydrate a particular file.
> Hydration and dehydration only happen as a side effect of other
> regular Subversion operations.
>
> 3) There is no way to rehydrate the entire working copy.  E.g.,
> something like 'svn update --store-pristines=yes' or 'svn hydrate
> --depth=infinity' does not exist yet.
>
> 4) Likewise, there is no way to dehydrate an existing working copy
> that currently has its pristines (even if that working copy is at
> a high-enough version format to support pristinelessness).  E.g.,
> something like 'svn update --store-pristines=no' or 'svn dehydrate
> --depth=infinity' does not exist yet.
>
> Is that all correct?

Yes, I believe that is correct.

> By the way, I do not think (2), (3), and (4) are blockers.  Just
> (1) by itself is a huge step forward and solves issue #525;

+1 on keeping the scope of the feature to just (1) for now.

> IMHO, once the tests are ready, we could merge it and release
> it to the world.

Apart from the required test changes, there are some technical
TODOs that remain from the initial patch and should be resolved.
I'll try to handle them as well.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-15 Thread Karl Fogel

On 15 Nov 2022, Evgeny Kotkov wrote:

Evgeny Kotkov  writes:

Perhaps we could transition into that state by committing the 
patch
and maybe re-evaluate things from there.  I could do that, 
assuming

no objections, of course.


Committed the patch in https://svn.apache.org/r1905324

I'll try to handle the related tasks in the near future.


Thank you, Evgeny!  Just to make sure I understand correctly -- 
the status now on the 'pristines-on-demand-on-mwf' branch is:


1) One can do 'svn checkout --store-pristines=no' to get an 
entirely pristine-less working copy.  In that working copy, 
individual files will be hydrated/dehydrated automagically on an 
as-needed basis.


2) There is no command to hydrate or dehydrate a particular file. 
Hydration and dehydration only happen as a side effect of other 
regular Subversion operations.


3) There is no way to rehydrate the entire working copy.  E.g., 
something like 'svn update --store-pristines=yes' or 'svn hydrate 
--depth=infinity' does not exist yet.


4) Likewise, there is no way to dehydrate an existing working copy 
that currently has its pristines (even if that working copy is at 
a high-enough version format to support pristinelessness).  E.g., 
something like 'svn update --store-pristines=no' or 'svn dehydrate 
--depth=infinity' does not exist yet.


Is that all correct?

By the way, I do not think (2), (3), and (4) are blockers.  Just 
(1) by itself is a huge step forward and solves issue #525; IMHO, 
once the tests are ready, we could merge it and release it to the 
world.


Best regards,
-Karl


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-15 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> Perhaps we could transition into that state by committing the patch
> and maybe re-evaluate things from there.  I could do that, assuming
> no objections, of course.

Committed the patch in https://svn.apache.org/r1905324

I'll try to handle the related tasks in the near future.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-08 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> By the way, in that thread, Evgeny Kotkov -- whose initial work
> much of this is based on -- follows up with a patch that does a
> first-pass implementation of 'svn checkout --store-pristines=no'
> (by implementing a new persistent setting in wc.db).

Perhaps we could transition into that state by committing the patch
and maybe re-evaluate things from there.  I could do that, assuming
no objections, of course.


Thanks,
Evgeny Kotkov


Re: Getting to first release of pristines-on-demand feature (#525).

2022-11-06 Thread Nathan Hartman
On Sat, Nov 5, 2022 at 6:13 PM Karl Fogel  wrote:
>
> Hi, all.  This is a high-level mail in which I try to figure out
> the current status of the issue #525 work and what's left to land
> it in trunk and release it.  Corrections and feedback welcome.

Thanks for the overview and the work already done to make this
possible!

The P-O-D feature itself works.

What's left to do for a first release, IMHO:

(1) Decide on user-facing names for the feature and its command line
switch(es).

(2) Resolve the [TODO] that Karl mentions (decoupling the compatible
version switch from the i525pod switch).

Though there are many other possible enhancements, some of them touched
upon in Karl's message, I think these two items are the only really
crucial ones for a first release.

I have much to say on both of these but I won't go into detail yet
because that would hijack the thread away from the high-level topic of:
what remains to be done for initial viable product? I'd like to give
others a chance to respond before we dive down the rabbit hole. :-)

It's better if each of the above becomes a thread devoted to that
topic.

I'll point out that some initial release note text was drafted at [1].

Cheers,
Nathan

[1] 
https://subversion-staging.apache.org/docs/release-notes/1.15.html#bare-working-copies