On Mon, Mar 14, 2022 at 6:48 AM Julian Foad <julianf...@apache.org> wrote: > > Dear dev community, and especially Karl and Mark: > > A plea to test the current design/implementation.
Feedback so far on the 'pristines-on-demand-on-mwf' branch as of r1898990: tl;dr: Pretty darn good for a first cut! I merged pristines-on-demand-on-mwf to trunk, built, and tried it on a real workload. Admittedly I did not run the test suite (though I plan to). Instead, every step along the way, I compared the 1.15+i525pod WC directory contents (excluding .svn) to those of a parallel 1.14 WC checked out and operated upon by a 1.14 client and things seem to be working correctly, at least insofar as actual files and their contents are concerned. Repo access was via "svn://" (svnserve). As of r1898990 it merged cleanly to trunk and built successfully (on Debian Linux). The "real workload" mentioned consisted of a working copy containing several svn:externals; I checked that out, but most operations were done within one of the nested WCs, which contains the majority of the data. That WC contains a total of 19161 versioned files. When checked out without i525pod: Total 38141 files including pristines occupying 512M. When checked out with i525pod: Total 19166 files occupying 270M, and SVN correctly set the parent and nested WCs to f32 format. The versioned contents are source files; not exactly a huge WC by today's standards but believe it or not this makes a big difference for me, as I often operate on WCs this size directly on embedded systems whose storage is somewhat constrained, and also on ramdrives, where it's nice to cut usage in half. I performed operations: checkout, info, switch, log, patch, update (including restore), cleanup, merge, diff, status, commit. I performed several commits. To verify that nothing became hosed here, I then performed a fresh checkout with a 1.14 client and compared its contents to the expected known state, which was the contents of the 1.15+i525pod WC. Also I ran 'svnadmin verify' on the repo (which is on a separate machine) and it verified successfully. (Since the commits succeeded, I didn't expect otherwise.) Speed of client operations didn't feel noticeably slower; maybe just a little bit for some operations for which pristines had to be fetched, but as the files in question were not huge, it was not very impactful. I understand that this will probably be more noticeable when the WC contains huge files that are modified but don't have a convenient way to test that at the moment. I used the plaintext credential cache and was not prompted for passwords at any time. (I should check with a credential method that does result in prompts and follow up... I understand it may currently prompt twice for one invocation.) One thing I noticed: With i525pod enabled, svn does not delete empty .svn/pristine/??/ subdirectories when no longer needed. (Not sure what the correct terminology is for those 2-digit-hex-code dirs.) It does purge the pristine files within these subdirectories, just not the subdirectories themselves. They even survive a 'svn cleanup --vacuum-pristines'. I haven't yet looked at the code to see whether non-i525pod SVN ever deletes these or not, or if there's a simple fix. I don't consider this a showstopper or a terribly big deal but unless this is expected behavior, I'd like to at least file an issue for it. More below inline... > We are worried that the current design won't be acceptable because it > has poor behaviour in a particular use case. > > The use case involved running "svn update" at the root of the WC. (It > didn't explicitly say that. More precisely, it implied the update target > tree contains the huge locally modified file.) As I said above, I didn't have a convenient way to test this use case but it would be informative to conjure one up (or wait for feedback from users with such a use case). > Using this new feature necessarily requires some adjustments to user > expectations and work flow. > > What if we ask the user to limit their "svn update" to target the > particular files/paths that they need to update, keeping their huge > locally modified file out of its scope? Examples: > > svn update readme.txt > svn update small-docs/ > # BUT NOT: svn update the-whole-wc/ > > Then we side-step the issue. It only fetches pristines for modified > files that are within the tree scope of the specified targets. (This is > how it works already, not a proposal.) > > OK that's not optimal but it might be sufficient. We could suggest that as a workaround in the release notes and point out that this could be optimized gradually in future releases. > (Of course there are further concerns, such as what happens if the user > starts an update at the WC root, then cancels it as it's taking too > long: can we gracefully recover? Fine, we can look at those concerns.) > > I can go ahead with further work on changing the design if required, but > I am concerned that might not be the best use of resources. Also I don't > know how to evaluate the balance of Evgeny's concerns about protocol > level complexity of the alternative design, against the concerns about > the present design. In other words pursuing that alternative seems > riskier, while accepting the known down-sides of the current design is > sub-optimal but seems less risky. > > Should we first test the current design and see if we can work with it, > before going full steam ahead into changing the design? > > The current design/implementation (on branch > 'pristines-on-demand-on-mwf') is in a working state. There are open > issues that still need to be resolved, but it's complete enough to be > ready for this level of testing. I think we should test the current design/implementation first; I do have the following suggestions, if feasible: 1. We should allow the i525pod feature to be enabled/disabled separately from updating WC to f32, even if such enabling/disabling can occur only at 'checkout' time for MVP, as i525pod should not be mandatory moving forward due to user-affecting tradeoffs such as some formerly client-only ops now requiring server access. 2. The BRANCH-README suggests to take advantage of the format bump to switch to a better checksum than SHA-1. I agree this does make sense and could provide users an additional reason to consider upgrading, but that increases the importance of item #2 above and may detract from the focus of this feature; on the upside, this may reduce future effort and would save a format bump. Summary: Overall, I'm impressed so far. Thanks for everyone's contributions, whether code, input, or financial, for making this possible! Cheers, Nathan