Re: Issue #525/#4892: on only fetching the pristines we really need

Nathan Hartman Thu, 17 Mar 2022 22:25:54 -0700

On Mon, Mar 14, 2022 at 6:48 AM Julian Foad <julianf...@apache.org> wrote:
>
> Dear dev community, and especially Karl and Mark:
>
> A plea to test the current design/implementation.


Feedback so far on the 'pristines-on-demand-on-mwf' branch as of
r1898990:

tl;dr: Pretty darn good for a first cut!

I merged pristines-on-demand-on-mwf to trunk, built, and tried it on a
real workload. Admittedly I did not run the test suite (though I plan
to). Instead, every step along the way, I compared the 1.15+i525pod WC
directory contents (excluding .svn) to those of a parallel 1.14 WC
checked out and operated upon by a 1.14 client and things seem to be
working correctly, at least insofar as actual files and their contents
are concerned. Repo access was via "svn://" (svnserve).

As of r1898990 it merged cleanly to trunk and built successfully (on
Debian Linux).

The "real workload" mentioned consisted of a working copy containing
several svn:externals; I checked that out, but most operations were
done within one of the nested WCs, which contains the majority of the
data.

That WC contains a total of 19161 versioned files.

When checked out without i525pod: Total 38141 files including
pristines occupying 512M.

When checked out with i525pod: Total 19166 files occupying 270M, and
SVN correctly set the parent and nested WCs to f32 format.

The versioned contents are source files; not exactly a huge WC by
today's standards but believe it or not this makes a big difference
for me, as I often operate on WCs this size directly on embedded
systems whose storage is somewhat constrained, and also on ramdrives,
where it's nice to cut usage in half.

I performed operations: checkout, info, switch, log, patch, update
(including restore), cleanup, merge, diff, status, commit.

I performed several commits. To verify that nothing became hosed here,
I then performed a fresh checkout with a 1.14 client and compared its
contents to the expected known state, which was the contents of the
1.15+i525pod WC. Also I ran 'svnadmin verify' on the repo (which is on
a separate machine) and it verified successfully. (Since the commits
succeeded, I didn't expect otherwise.)

Speed of client operations didn't feel noticeably slower; maybe just a
little bit for some operations for which pristines had to be fetched,
but as the files in question were not huge, it was not very impactful.
I understand that this will probably be more noticeable when the WC
contains huge files that are modified but don't have a convenient way
to test that at the moment.

I used the plaintext credential cache and was not prompted for
passwords at any time. (I should check with a credential method that
does result in prompts and follow up... I understand it may currently
prompt twice for one invocation.)

One thing I noticed: With i525pod enabled, svn does not delete empty
.svn/pristine/??/ subdirectories when no longer needed. (Not sure what
the correct terminology is for those 2-digit-hex-code dirs.) It does
purge the pristine files within these subdirectories, just not the
subdirectories themselves. They even survive a
'svn cleanup --vacuum-pristines'. I haven't yet looked at the code to
see whether non-i525pod SVN ever deletes these or not, or if there's a
simple fix. I don't consider this a showstopper or a terribly big deal
but unless this is expected behavior, I'd like to at least file an
issue for it.

More below inline...

> We are worried that the current design won't be acceptable because it
> has poor behaviour in a particular use case.
>
> The use case involved running "svn update" at the root of the WC. (It
> didn't explicitly say that. More precisely, it implied the update target
> tree contains the huge locally modified file.)

As I said above, I didn't have a convenient way to test this use case
but it would be informative to conjure one up (or wait for feedback
from users with such a use case).

> Using this new feature necessarily requires some adjustments to user
> expectations and work flow.
>
> What if we ask the user to limit their "svn update" to target the
> particular files/paths that they need to update, keeping their huge
> locally modified file out of its scope? Examples:
>
> svn update readme.txt
> svn update small-docs/
> # BUT NOT: svn update the-whole-wc/
>
> Then we side-step the issue. It only fetches pristines for modified
> files that are within the tree scope of the specified targets. (This is
> how it works already, not a proposal.)
>
> OK that's not optimal but it might be sufficient.

We could suggest that as a workaround in the release notes and point
out that this could be optimized gradually in future releases.

> (Of course there are further concerns, such as what happens if the user
> starts an update at the WC root, then cancels it as it's taking too
> long: can we gracefully recover? Fine, we can look at those concerns.)
>
> I can go ahead with further work on changing the design if required, but
> I am concerned that might not be the best use of resources. Also I don't
> know how to evaluate the balance of Evgeny's concerns about protocol
> level complexity of the alternative design, against the concerns about
> the present design. In other words pursuing that alternative seems
> riskier, while accepting the known down-sides of the current design is
> sub-optimal but seems less risky.
>
> Should we first test the current design and see if we can work with it,
> before going full steam ahead into changing the design?
>
> The current design/implementation (on branch
> 'pristines-on-demand-on-mwf') is in a working state. There are open
> issues that still need to be resolved, but it's complete enough to be
> ready for this level of testing.

I think we should test the current design/implementation first; I do
have the following suggestions, if feasible:

1. We should allow the i525pod feature to be enabled/disabled
   separately from updating WC to f32, even if such enabling/disabling
   can occur only at 'checkout' time for MVP, as i525pod should not be
   mandatory moving forward due to user-affecting tradeoffs such as
   some formerly client-only ops now requiring server access.

2. The BRANCH-README suggests to take advantage of the format bump to
   switch to a better checksum than SHA-1. I agree this does make
   sense and could provide users an additional reason to consider
   upgrading, but that increases the importance of item #2 above and
   may detract from the focus of this feature; on the upside, this may
   reduce future effort and would save a format bump.

Summary: Overall, I'm impressed so far. Thanks for everyone's
contributions, whether code, input, or financial, for making this
possible!

Cheers,
Nathan

Re: Issue #525/#4892: on only fetching the pristines we really need

Reply via email to