Re: svn commit: r1921180 - in /subversion/tags/1.14.4: ./ subversion/include/private/ subversion/libsvn_subr/ subversion/svn/ subversion/svnadmin/ subversion/svnbench/ subversion/svndumpfilter/ subver
Stefan Sperling writes: > tagging Subversion 1.14.4 > > Equivalent to 1.14.x@1920901 with the patch for CVE-2024-45720 applied. > > Added: > subversion/tags/1.14.4/ (props changed) > - copied from r1920901, subversion/branches/1.14.x/ > Modified: > subversion/tags/1.14.4/CHANGES > subversion/tags/1.14.4/build.conf > subversion/tags/1.14.4/subversion/include/private/svn_cmdline_private.h > subversion/tags/1.14.4/subversion/libsvn_subr/cmdline.c > subversion/tags/1.14.4/subversion/svn/svn.c > subversion/tags/1.14.4/subversion/svnadmin/svnadmin.c > subversion/tags/1.14.4/subversion/svnbench/svnbench.c > subversion/tags/1.14.4/subversion/svndumpfilter/svndumpfilter.c > subversion/tags/1.14.4/subversion/svnfsfs/svnfsfs.c > subversion/tags/1.14.4/subversion/svnlook/svnlook.c > subversion/tags/1.14.4/subversion/svnmucc/svnmucc.c > subversion/tags/1.14.4/subversion/svnrdump/svnrdump.c > subversion/tags/1.14.4/subversion/svnserve/svnserve.c > subversion/tags/1.14.4/subversion/svnsync/svnsync.c > subversion/tags/1.14.4/subversion/svnversion/svnversion.c > > subversion/tags/1.14.4/tools/client-side/svn-mergeinfo-normalizer/svn-mergeinfo-normalizer.c > subversion/tags/1.14.4/tools/client-side/svnconflict/svnconflict.c > > subversion/tags/1.14.4/tools/dev/svnraisetreeconflict/svnraisetreeconflict.c > subversion/tags/1.14.4/tools/dev/wc-ng/svn-wc-db-tester.c > subversion/tags/1.14.4/tools/server-side/svnauthz.c Hi Stefan, It looks like the tag was created without the updates to `svn_version.h`, unless I am missing something. For example, SVN_VER_TAG seems to say "(under development)": https://svn.apache.org/viewvc/subversion/tags/1.14.4/subversion/include/svn_version.h?view=markup#l96 Thanks, Evgeny Kotkov
Re: svn commit: r1915519 [1/4] - in /subversion/branches/pristine-checksum-salt: ./ build/ build/ac-macros/ build/generator/ build/generator/swig/ contrib/client-side/svn_load_dirs/ contrib/hook-scrip
Daniel Sahlberg writes: > Log: > On branch pristine-checksum-salt: > > Catchup merge with trunk `pristine-checksum-salt` is branched off from branches/pristine-checksum-kind (rather than from trunk), so in this case it should probably be better to merge the trunk changes into the parent branch first, and propagate them into the `pristine-checksum-salt` branch from parent. Thanks, Evgeny Kotkov
Re: Changing the permission checks in libsvn_subr/io.c
Daniel Sahlberg writes: >> Index: subversion/libsvn_subr/io.c >> === >> --- subversion/libsvn_subr/io.c (revision 1915064) >> +++ subversion/libsvn_subr/io.c (working copy) >> @@ -2535,7 +2535,14 @@ svn_io__is_finfo_read_only(svn_boolean_t *read_onl >>apr_uid_t uid; >>apr_gid_t gid; >> >> - *read_only = FALSE; >> + *read_only = (access(file_info->fname, W_OK) != 0); There are a few problems, because this approach adds a I/O syscall (that checks access to a file by its name) into a fairly low-level function. Previously this function was analyzing the file information from the passed-in apr_finfo_t structure. The new version, however, performs an I/O call, and that leads to a couple of problems: 1) Potential performance regression, not limited to just the revert. For example, all calls to open_pack_or_rev_file() in libsvn_fs_fs would now start to make additional access() calls, thus slowing down the repository operations. 2) This adds an opportunity for a desynchronization/race between the information in apr_finfo_t and the result of access(), because access() works by a filename. For example, in a case where a file is recreated between the two calls, its result will correspond to a completely different file, compared to the one that produced the apr_finfo_t. Overall, I have a feeling that solving this specific edge-case might not be worth introducing these regressions (which look significant, at least to me). Thanks, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Sahlberg writes: > As far as I understand, the point of multi-hash is to keep the WC format > between versions (so older clients can continue to use the WC). Just as a minor note, the working copies created using the implementation on the `pristine-checksum-salt` branch don't multi-hash the contents, but rather make the [single] used checksum kind configurable and persist it at the moment when a working copy is created or upgraded. > I need some help to understand how that would work in practice. Let's say > that 1.15 adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But... > - A 1.17 client will only use SHAABC or SHAXYZ hashes. > - A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes. > - A 1.15 client can only use SHA1 and SHAABC hashes. > > How can these work together? A WC created in 1.17 can't be used by a 1.15 > client and a WC created in 1.15 (with SHA1) can't be used by a 1.17 client. > How is this different from bumping the format? How do we detect this? In the current design available on the `pristine-checksum-salt` branch, the supported checksum kinds are tied to a working copy format, and any supported checksum kind may additionally use a dynamic salt. For example, format 33 supports only SHA-1 (regular or dynamically salted), but a newer format 34 can add support for another checksum kind such as SHA-2 if necessary. When an existing working copy is upgraded to a newer format, its current checksum kind is retained as is (we can't rehash the content in a `--store-pristine=no` case because the pristines are not available). I don't know if we'll find ourselves having to forcefully phase out SHA-1 *even* for such working copies that retain an older checksum kind, i.e., it might be enough to use the new checksum kind only for freshly created working copies. However, there would be a few options to consider: I think that milder options could include warning the user to check out a new working copy (that would use a different checksum kind), and a harsher option could mean adding a new format that doesn't support SHA-1 under any circumstances, and declaring all previously available working copy formats unsupported. Regards, Evgeny Kotkov
Re: Getting to first release of pristines-on-demand feature (#525).
Evgeny Kotkov writes: > Merged in https://svn.apache.org/r1905955 > > I'm going to respond on the topic of SHA1 a bit later. For the history: thread [1] proposes the `pristine-checksum-salt` branch that adds the infrastructure to support new pristine checksum kinds in the working copy and makes a switch to the dynamically-salted SHA1. >From the technical standpoint, I think that it would be better to release the first version of the pristines-on-demand feature having this branch merged, because now we rely on the checksum comparison to determine if a file has changed — and currently it's a checksum kind with known collisions. At the same time, having that branch merged probably isn't a formal release blocker for the pristines-on-demand feature. Also, considering that the `pristine-checksum-salt` branch is currently vetoed by danielsh (presumably, for an indefinite period of time), I'd like to note that personally I have no objections to proceeding with a release of the pristines-on-demand feature without this branch. [1] https://lists.apache.org/thread/xmd7x6bx2mrrbw7k5jr1tdmhhrlr9ljc Regards, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Shahaf writes: > Procedurally, the long hiatus is counterproductive. This reminds me that the substantive discussion of your veto ended with my email from 8 Feb 2023 that had four direct questions to you and was left without an answer: `` > That's not how design discussions work. A design discussion doesn't go > "state decision; state pros; implement"; it goes "state problem; discuss > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]). Well, I think it may not be as simple as it seems to you. Who decided that we should follow the process you're describing? Is there a thread with a consensus on this topic? Or do you insist on using this specific process because it's the only process that seems obvious to you? What alternatives to it have been considered? As far as I can tell, the process you're suggesting is effectively a waterfall-like process, and there are quite a lot of concerns about its effectiveness, because the decisions have to be made in the conditions of a lack of information. `` It's been more than 11 months since that email, and those questions still don't have an answer. So if we are to resume this discussion, let's do it from the proper point. > You guys are welcome to try to /convince/ me to change my opinion, or to > have the veto invalidated. In either case, you will be more likely to > succeed should your arguments relate not only to the veto's implications > but also to its /sine qua non/ component: its rationale. Just in case, my personal opinion here is that the veto is invalid. Firstly, based on my understanding, the ASF rules prohibit casting a veto without an appropriate technical justification (see [1], which I personally agree with). Secondly, it seems that the process you are imposing hasn't been accepted in this community. As far as I know, this topic was tangentially discussed before (see [2], for example), and it looks like there hasn't been a consensus to change our current Commit-Then-Review process into some sort of Review-Then-Commit. (At the same time I won't even try to /convince/ you, sorry.) [1] https://www.apache.org/foundation/voting.html [2] https://lists.apache.org/thread/ow2x68g2k4lv2ycr81d14p8r8w2jj1xl Regards, Evgeny Kotkov
Re: Subversion 1.14.3 up for testing/signing
Nathan Hartman writes: > The 1.14.3 release artifacts are now available for testing/signing. > Please get the tarballs from > https://dist.apache.org/repos/dist/dev/subversion > and add your signatures there. Summary: +1 to release (Windows) Tested: [ fsfs ] x [ http v1 | http v2 ] Results: All tests passed Platform: Windows 10 x64 (Version 22H2) Visual Studio Build Tools 2022 (Version 17.7.7) Dependencies: apr 1.6.5 apr-util 1.6.3 httpd 2.4.58 openssl 3.0.12 serf 1.3.10 sqlite 3.41.2 zlib 1.2.12 Committed file signature in r66017. Thanks, Evgeny Kotkov
Re: E160056 Item index too large in revision
Daniel Sahlberg writes: > In the TortoiseSVN-dev group it is claimed to be because of APR 1.7.3 and > that reverting back to 1.7.2 restores normal operation, I havn't yet > verified nor tried to figure out which of the changes between 1.7.2 > and 1.7.3 may be responsible for this issue. As I know some of the > APR committers are also here, I'm hoping for some help. The problem seems to be caused by a change in APR 1.7.3, authored by me in an attempt to fix wrong file offsets reported after opening files on Windows and Unix: *) Don't seek to the end when opening files with APR_FOPEN_APPEND on Windows. Unfortunately, I missed a case where flushing a APR_APPEND | APR_BUFFERED file was implicitly relying on that specific file offset, and that is what caused the regression you're seeing. I have now reverted this change in https://svn.apache.org/r1909088 and https://svn.apache.org/r1909089, so hopefully this regression will be fixed in the next APR 1.7.x patch release. Thanks, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Shahaf writes: > What's the question or action item to/for me? Thanks. I'm afraid I don't fully understand your question. As you probably remember, the change is blocked by your veto. To my knowledge, this veto hasn't been revoked as of now, and I simply mentioned that in my email. It is entirely your decision whether or not to take any action regarding this matter. Thanks, Evgeny Kotkov
Re: [PROPOSAL] Allow plaintext passwords again.
Nathan Hartman writes: > I think a good middle ground is: > > * Build with --enable-plaintext-password-storage by default; users who > want to harden their system can do so, but will need to build their > own client. +1. > * Set the default run-time config to store-plaintext-passwords = no > (if it isn't already; haven't checked) and instruct users on how to > change it. This makes the decision to store in plaintext an explicit > one that the user can opt into. (I appreciate that this could be > changed without the user's knowledge; perhaps the systemwide config > should always take precedence over the user-controlled one for this > setting?) So, apparently, the current default is "ask". I haven't checked all the details, but I think that defaulting to "ask" already makes the user decision explicit and allows it to happen naturally, without requiring any additional instructions or knowledge. If we change the default to "no", this part of the experience could be worse, because for the end users it might look like the credentials aren't being stored for unknown reasons / a bug in the software. Thanks, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Evgeny Kotkov writes: > > Now, how hard would this be to actually implement? > > To have a more or less accurate estimate, I went ahead and prepared the > first-cut implementation of an approach that makes the pristine checksum > kind configurable in a working copy. > > The current implementation passes all tests in my environment and seems to > work in practice. It is available on the branch: > > https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind > > The implementation on the branch allows creating working copies that use a > checksum kind other than SHA-1. I extended the current implementation to use a dynamically salted SHA-1 checksum, rather than a SHA-1 with a statically hardcoded salt. The dynamic salt is generated during the creation of a wc.db. The implementation is available on a separate branch: https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt The change is a bit massive, but in the meantime I think that it should solve the potential problem without any practical drawbacks, except for the lack of the mentioned ra_serf fetch optimization. So overall I'd propose to bring this change to trunk, to improve the current state around checksum collisions in the working copy, and to also have the infrastructure for supporting different checksum kinds in place, in case we need it in the future. This change is still being blocked by a veto, but if danielsh changes his mind and if there won't be other objections, I'm ready to complete the few remaining bits and merge it to trunk. Thanks, Evgeny Kotkov
Re: svn commit: r1907965 - in /subversion/trunk/subversion: include/ include/private/ libsvn_client/ libsvn_wc/ svn/ tests/cmdline/
Jun Omae writes: > After r1907965, build on Windows is failing with the following errors. > > [[[ > svn_client-1.lib(upgrade.obj) : error LNK2001: unresolved external symbol > svn_wc__version_string_from_format > [D:\a\subversion\subversion\build\win32\vcnet-vcproj\libsvn_client_dll.vcxproj] > D:\a\subversion\subversion\Release\subversion\libsvn_client\libsvn_client-1.dll > : fatal error LNK1120: 1 unresolved externals > [D:\a\subversion\subversion\build\win32\vcnet-vcproj\libsvn_client_dll.vcxproj] > ]]] Thanks, should be fixed in r1908042. Regards, Evgeny Kotkov
Re: Initial patch for storing the pristines-on-demand setting in the working copy (Issue 4889)
Evgeny Kotkov writes: > While working on the patch, I have stumbled across a couple of issues: > > A) `svn upgrade` without arguments fails for a working copy with latest format > > $ svn checkout --compatible-version=1.15 wc > $ svn upgrade wc > $ svn: E155021: Working copy '…' is already at version 1.15 (format 32) > and cannot be downgraded to version 1.8 (format 31) While the work on pristine checksum kinds is blocked by a veto, I took a look at other things we'd probably want to handle before the release. I think that the above case qualifies as such, so I made a related improvement by introducing a new config option and also fixed the described error: - r1907964 introduces a new `compatible-version` config setting, to allow configuring the desired default wc compatibility level globally. - r1907965 fixes an error when `svn upgrade` is called without any arguments for a working copy of the newer format. Thanks, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Shahaf writes: > Look, it's pretty simple. You said "We should do Y because it > addresses X". You didn't explain why X needs to be addressed, didn't > consider what alternatives there are to Y, didn't consider any cons that > Y may have… and when people had questions, you just began to > implement Y, without responding to or even acknowledging those > questions. > > That's not how design discussions work. A design discussion doesn't go > "state decision; state pros; implement"; it goes "state problem; discuss > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]). Well, I think it may not be as simple as it seems to you. Who decided that we should follow the process you're describing? Is there a thread with a consensus on this topic? Or do you insist on using this specific process because it's the only process that seems obvious to you? What alternatives to it have been considered? As far as I can tell, the process you're suggesting is effectively a waterfall-like process, and there are quite a lot of concerns about its effectiveness, because the decisions have to be made in the conditions of a lack of information. Personally, I prefer an alternative process that starts from finding out all available bits of information, which are then used to make informed decisions. The unfortunate reality, however, is that the only guaranteed way of collecting all information means implementing all (or almost all) significant parts in code. Roughly speaking, this process looks like a research project that gets completed by trial and error. Based on what you've been saying so far, I wouldn't be surprised if you disagree. But I still think that forcing the others to follow a certain process by such means as vetoing a code change is maybe a bit over the top. (In the meantime, I certainly won't object if you're going to use this waterfall-like process for the changes that you implement yourself.) Regards, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Shahaf writes: > > (I'm not saying that the above rules have to be used in this particular case > > and that a veto is invalid, but still thought it’s worth mentioning.) > > > > I vetoed the change because it hadn't been designed on the dev@ list, > had not garnered dev@'s consensus, and was being railroaded through. > (as far as I could tell) I have *absolutely* no idea where "being railroaded through" comes from. Really, it's a wrong way of portraying and thinking about the events that have happened so far. Reiterating over those events: I wrote an email containing my thoughts and explaining the motivation for such change. I didn't reply to some of the questions (including some tricky questions, such as the one featuring a theoretical hash function), because they have been at least partly answered by others in the thread, and I didn't have anything valuable to add at that time. During that time, I was actively coding the core part of the change, to check if it's possible technically. Which is important, as far as I believe, because not all theoretically possible solutions can be implemented without facing significant practical or implementation-related issues, and it seems to me that you significantly undervalue such an approach. I do not say my actions were exemplary, but as far as I can tell, they're pretty much in line with how svn-dev has been operating so far. But, it all resulted in an unclear veto without any _technical_ arguments, where what's being vetoed is unclear as well, because the change was not ready at the moment veto got casted. And because your veto goes in favor of a specific process (considering that no other arguments were given), the only thing that's *actually* being railroaded is an odd form of an RTC (review-then-commit) process that is against our usual CTR (commit-then-review) [1,2]. That's railroading, because it hasn't been explicitly discussed anywhere and a consensus on it has not been reached. [1] https://www.apache.org/foundation/glossary.html#CommitThenReview [2] https://www.apache.org/foundation/glossary.html#ReviewThenCommit Regards, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Shahaf writes: > > That could happen after a public disclosure of a pair of executable > > files/scripts where the forged version allows for remote code execution. > > Or maybe something similar with a file format that is often stored in > > repositories and that can be executed or used by a build script, etc. > > > > Err, hang on. Your reference described a chosen-prefix attack, while > this scenario concerns a single public collision. These are two > different things. A chosen-prefix attack allows finding more meaningful collisions such as working executables/scripts. When such collisions are made public, they would have a greater exploitation potential than just a random collision. > Disclosure of of a pair of executable files/scripts isn't by itself > a problem unless one of the pair ("file A") is in a repository > somewhere. Now, was the colliding file ("file B") generated _before_ or > _after_ file A was committed? > > - If _before_, then it would seem Mallory had somehow managed to: > > 1. get a file of his choosing committed to Alice's repository; and > > 2. get a wc of Alice's repository into one of the codepaths that > assume SHA-1 is one-to-one / collission-free (currently that's the > ra_serf optimization and the 1.15 wc status). Not only. There are cases when the working copy itself installs the working file with a hash lookup in the pristine store. This is more true for 1.14 than trunk, because in trunk we have the streamy checkout/update that avoid such lookups by writing straight to the working file. However, some of the code paths still install the contents from the pristine store by hash. Examples include reverting a file, copying an unmodified file, switching a file with keywords, the mentioned ra_serf optimization, and etc. > Now, step #1 seems plausible enough. As to step #2, it's not clear to > me how file B would reach the wc in step #2… If Mallory has write access, she could commit both files, thus arranging for a possible content change if both files are checked out to a single working copy. This isn't the same as just directly modifying the target file, because file content isn't expected to change due to changes in other files (that can be of any type), so this attack has much better chances of being unnoticed. If Mallory doesn't have write access, there should be other vectors, such as distributing a pair of files (harmless in the context of their respective file formats) separately via two upstream channels. Then, if both of the upstream distributions are committed into a repository and their files are checked out together, the content will change, allowing for a malicious action. Regards, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Daniel Shahaf writes: > > I can complete the work on this branch and bring it to a production-ready > > state, assuming there are no objections. > > Your assumption is counterfactual: > > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E > > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E I don't see any explicit objections in these two emails (here I assume that if something is not clear to a PMC member, it doesn't automatically become an objection). If the "why?" question is indeed an objection, then I would say it has already been discussed and responded to in the thread. Now, returning to the problem: As described in the advisory [1], we have a supported configuration that makes data forgery possible: - A repository with disabled rep-sharing allows storing different files with colliding SHA-1 values. - Having a repository with disabled rep-sharing is a supported configuration. There may be a certain number of such repositories in the wild (for example, created with SVN < 1.6 and not upgraded afterwise). - A working copy uses an assumption that the pristine contents are equal if their SHA-1 hashes are equal. - So committing different files with colliding SHA-1 values makes it possible to forge the contents of a file that will be checked-out and used by the client. I would say that this state is worrying just by itself. However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's probably only a matter of time until the situation becomes worse. That could happen after a public disclosure of a pair of executable files/scripts where the forged version allows for remote code execution. Or maybe something similar with a file format that is often stored in repositories and that can be executed or used by a build script, etc. [1] https://subversion.apache.org/security/sha1-advisory.txt [2] https://sha-mbles.github.io/ Speaking of the proposed switch to SHA-256 or a different checksum, there's an argument by contradiction: if we were designing the pristineless working copy from scratch today, would we choose SHA-1 as the best available hash that can be used to assert content equality? If yes, how can one prove that? > Objections have been raised, been left unanswered, and now implementation > work has commenced following the original design. That's not acceptable. > I'm vetoing the change until a non-rubber-stamp design discussion has > been completed on the public dev@ list. I would like to note that vetoing a code modification should be accompanied with a technical justification, and I have certain doubts that the above arguments qualify as such: https://www.apache.org/foundation/voting.html [[[ To prevent vetoes from being used capriciously, the voter must provide with the veto a technical justification showing why the change is bad (opens a security exposure, negatively affects performance, etc. ). A veto without a justification is invalid and has no weight. ]]] (I'm not saying that the above rules have to be used in this particular case and that a veto is invalid, but still thought it’s worth mentioning.) Anyway, I'll stop working on the branch, because a veto has been casted. Regards, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Karl Fogel writes: > Now, how hard would this be to actually implement? To have a more or less accurate estimate, I went ahead and prepared the first-cut implementation of an approach that makes the pristine checksum kind configurable in a working copy. The current implementation passes all tests in my environment and seems to work in practice. It is available on the branch: https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind The implementation on the branch allows creating working copies that use a checksum kind other than SHA-1. The checksum kind is persisted in the settings table. Upgraded working copies of the older formats will have SHA-1 recorded as their pristine checksum kind and will continue to use it for compatibility. Newly created working copies of the latest format (with --compatible-version=1.15 or --store-pristine=no), as currently implemented, will use the new pristine checksum kind. Currently, as a proof-of-concept, the branch uses salted SHA-1 as the new pristine checksum kind. For the production-ready state, I plan to support using multiple new checksum types such as SHA-256. I think that it would be useful for future compatibility, because if we encounter any issues with one checksum kind, we could then switch to a different kind without having to change the working copy format. One thing worth noting is that ra_serf contains a specific optimization for the skelta-style updates that allows skipping a GET request if the pristine store already contains an entry with the specified SHA-1 checksum. Switching to a different checksum type for the pristine entries is going to disable that specific optimization. Re-enabling it would require an update of the server-side. I consider this to be out of scope for this branch. I can complete the work on this branch and bring it to a production-ready state, assuming there are no objections. Thanks, Evgeny Kotkov
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Karl Fogel writes: > Now, how hard would this be to actually implement? I plan to take a more detailed look at that, but I'm currently on vacation for the New Year holidays. Thanks, Evgeny Kotkov
Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)
Karl Fogel writes: > > While here, I would like to raise a topic of incorporating a switch from > > SHA1 to a different checksum type (without known collisions) for the new > > working copy format. This topic is relevant to the pristines-on-demand > > branch, because the new "is the file modified?" check relies on the > > checksum comparison, instead of comparing the contents of working and > > pristine files. > > > > And so while I consider it to be out of the scope of the pristines-on- > > demand branch, I think that we might want to evaluate if this is something > > that should be a part of the next release. > > Good point. Maybe worth a new thread? [Moving discussion to a new thread] We currently have a problem that a working copy relies on the checksum type with known collisions (SHA1). A solution to that problem is to switch to a different checksum type without known collisions in one of the newer working copy formats. Since we plan on shipping a new working copy format in 1.15, this seems to be an appropriate moment of time to decide whether we'd also want to switch to a checksum type without known collisions in that new format. Below are the arguments for including a switch to a different checksum type in the working copy format for 1.15: 1) Since the "is the file modified?" check now compares checksums, leaving everything as-is may be considered a regression, because it would introduce additional cases where a working copy currently relies on comparing checksums with known collisions. 2) We already need a working copy format bump for the pristines-on-demand feature. So using that format bump to solve the SHA1 issue might reduce the overall number of required bumps for users (assuming that we'll still need to switch from SHA1 at some point later). 3) While the pristines-on-demand feature is not released, upgrading with a switch to the new checksum type seems to be possible without requiring a network fetch. But if some of the pristines are optional, we lose the possibility to rehash all contents in place. So we might find ourselves having to choose between two worse alternatives of either requiring a network fetch during upgrade or entirely prohibiting an upgrade of working copies with optional pristines. Thoughts? Thanks, Evgeny Kotkov
Re: Getting to first release of pristines-on-demand feature (#525).
Evgeny Kotkov writes: > I think that the `pristines-on-demand-on-mwf` branch is now ready for a > merge to trunk. I could do that, assuming there are no objections. Merged in https://svn.apache.org/r1905955 I'm going to respond on the topic of SHA1 a bit later. Thanks, Evgeny Kotkov
Re: Getting to first release of pristines-on-demand feature (#525).
Evgeny Kotkov writes: > > IMHO, once the tests are ready, we could merge it and release > > it to the world. > > Apart from the required test changes, there are some technical > TODOs that remain from the initial patch and should be resolved. > I'll try to handle them as well. I think that the `pristines-on-demand-on-mwf` branch is now ready for a merge to trunk. I could do that, assuming there are no objections. https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand-on-mwf The branch includes the following: – Core implementation of the new mode where required pristines are fetched at the beginning of the operation. – A new --store-pristine=yes/no option for `svn checkout` that is persisted as a working copy setting. – An update for `svn info` to display the value of this new setting. – A standalone test harness that tests main operations in both --store-pristine modes and gets executed on every test run. – A new --store-pristine=yes/no option for the test suite that forces all tests to run with a specific pristine mode. The branch passes all tests in my Windows and Linux environments, in both --store-pristine=yes and =no modes. While here, I would like to raise a topic of incorporating a switch from SHA1 to a different checksum type (without known collisions) for the new working copy format. This topic is relevant to the pristines-on-demand branch, because the new "is the file modified?" check relies on the checksum comparison, instead of comparing the contents of working and pristine files. And so while I consider it to be out of the scope of the pristines-on-demand branch, I think that we might want to evaluate if this is something that should be a part of the next release. Thanks, Evgeny Kotkov
Re: svn commit: r1905663 - in /subversion/branches/pristines-on-demand-on-mwf/subversion: include/ include/private/ libsvn_client/ libsvn_wc/
Bert Huijben writes: > All the now deprecated functions now fail unconditionally when the setting > is enabled. Isn’t it possible to do this more graceful whenever a file is > encountered which misses it’s prisite version? The problem with this approach is that the functions are going to work, but only *sometimes*, and will fail unpredictably, depending on whether a pristine was fetched or removed by any previous API call. With that in mind, failing consistently seems to be a more appropriate choice for a deprecated function than failing randomly. > As far as I know it is expected that some of the files do have pristines, > while others don’t… That would allow things like diffs on old clients that > didn’t switch apis yet. Thinking about this, the currently documented assumption is that a file has no pristine if and only if it's being added/replaced without a copyfrom. So maybe we cannot really extend that to "any file might not have a pristine" without it being an incompatible change. Thanks, Evgeny Kotkov
Re: svn commit: r1905663 - in /subversion/branches/pristines-on-demand-on-mwf/subversion: include/ include/private/ libsvn_client/ libsvn_wc/
Daniel Sahlberg writes: >> + /** @since New in 1.15 */ >> + SVN_ERRDEF(SVN_ERR_WC_DEPRECATED_API_STORE_PRISTINE, >> + SVN_ERR_WC_CATEGORY_START + 43, >> + "This client was not updated to support working copies " >> + "without local pristines") >> + >>/* fs errors */ > > Is it really "This client"? It looks more to be based on the WC setting. This new error should only occur when someone (a 3rd party client) uses the deprecated API function for a working copy without pristines. >From that point of view, I would say that the problem is in the client that is still using the deprecated API and thus cannot properly handle such working copies. Also, the current message aims to be somewhat consistent with another existing error for a case when a client is too old to read a newer working copy format: "This client is too old to work with the working copy […] (format 32)" (After writing this, I realized that the "was not updated" part of the message may be misinterpreted as a requirement to update to the newer version of the software. In r1905682, I slightly rephrased that part so that it would say: "This client uses a deprecated API that does not support working copies without local pristines") Thanks, Evgeny Kotkov
Re: Getting to first release of pristines-on-demand feature (#525).
Karl Fogel writes: > Thank you, Evgeny! Just to make sure I understand correctly -- > the status now on the 'pristines-on-demand-on-mwf' branch is: > > 1) One can do 'svn checkout --store-pristines=no' to get an > entirely pristine-less working copy. In that working copy, > individual files will be hydrated/dehydrated automagically on an > as-needed basis. > > 2) There is no command to hydrate or dehydrate a particular file. > Hydration and dehydration only happen as a side effect of other > regular Subversion operations. > > 3) There is no way to rehydrate the entire working copy. E.g., > something like 'svn update --store-pristines=yes' or 'svn hydrate > --depth=infinity' does not exist yet. > > 4) Likewise, there is no way to dehydrate an existing working copy > that currently has its pristines (even if that working copy is at > a high-enough version format to support pristinelessness). E.g., > something like 'svn update --store-pristines=no' or 'svn dehydrate > --depth=infinity' does not exist yet. > > Is that all correct? Yes, I believe that is correct. > By the way, I do not think (2), (3), and (4) are blockers. Just > (1) by itself is a huge step forward and solves issue #525; +1 on keeping the scope of the feature to just (1) for now. > IMHO, once the tests are ready, we could merge it and release > it to the world. Apart from the required test changes, there are some technical TODOs that remain from the initial patch and should be resolved. I'll try to handle them as well. Thanks, Evgeny Kotkov
Re: Getting to first release of pristines-on-demand feature (#525).
Evgeny Kotkov writes: > Perhaps we could transition into that state by committing the patch > and maybe re-evaluate things from there. I could do that, assuming > no objections, of course. Committed the patch in https://svn.apache.org/r1905324 I'll try to handle the related tasks in the near future. Thanks, Evgeny Kotkov
Re: Getting to first release of pristines-on-demand feature (#525).
Karl Fogel writes: > By the way, in that thread, Evgeny Kotkov -- whose initial work > much of this is based on -- follows up with a patch that does a > first-pass implementation of 'svn checkout --store-pristines=no' > (by implementing a new persistent setting in wc.db). Perhaps we could transition into that state by committing the patch and maybe re-evaluate things from there. I could do that, assuming no objections, of course. Thanks, Evgeny Kotkov
Initial patch for storing the pristines-on-demand setting in the working copy (Issue 4889)
Julian Foad writes: > Issue #4889 "per-WC config" is the subject of Johan's new dev@ post > "Pristines-on-demand=enabled == format 32?". We already concurred that > it's wise to decouple "pristines-on-demand mode is enabled in this WC" > from "the WC format is (at least) 32 so can support that mode". > <https://subversion.apache.org/issue/4889>. This may be considered > higher priority than fixing the remaining tests. I have been thinking about this recently, and here is a patch with the first-cut implementation that persists the pristines-on-demand setting in a working copy. Unfortunately, I am getting a bit swamped with other things to complete the work on it, but perhaps it could be useful as a building block for the full implementation. The patch currently allows doing an `svn checkout --store-pristines=no`, which is going to create a working copy that doesn't store the pristine copies of the files and fetches them on demand. The setting is persisted in wc.db. The patch doesn't include the following: 1) An update for the tests and the test suite to run the tests in both modes. Personally, I think that we should update the test runner so that it would execute the tests for both pristine modes by default, without requiring any specific switches. Because otherwise, there is a chance that one of the equally supported core configurations may receive far less attention during the development and test runs. 2) An update for `svn info` to display the value of the new setting. 3) An ability to take the --store-pristines value from a user config, perhaps on a per-URL basis. While working on the patch, I have stumbled across a couple of issues: A) `svn upgrade` without arguments fails for a working copy with latest format $ svn checkout --compatible-version=1.15 wc $ svn upgrade wc $ svn: E155021: Working copy '…' is already at version 1.15 (format 32) and cannot be downgraded to version 1.8 (format 31) I haven't given it a lot of thought, but we might want to handle this case without an error or even think about making `svn upgrade` by default upgrade to the latest available version instead of the minimum supported (similar to `svnadmin upgrade`). B) Shelving and pristines-on-demand It seems that both v2 and v3 shelving implementations are currently not updated to support pristines-on-demand working copies. C) Bumping the related API This part originates from B). For example, v3 shelving uses the libsvn_wc APIs, such as svn_wc_revert6(). If the working copy is created without the pristine contents, those calls are going to fail with an error saying that there is no text-base for the corresponding path. This is a tricky error to understand, and the failure itself is unpredictable, because it depends on whether any of the previous API calls have fetched the missing text-bases. So if we think about v3 shelving as an example of the libsvn_wc API user, other existing third-party users of the API could face the same problem. Perhaps, we could bump the APIs that currently rely on the text-bases to always be available. And we could then make their deprecated versions fail (predictably) for working copies that don't store pristine contents. Thanks, Evgeny Kotkov Index: subversion/include/private/svn_wc_private.h === --- subversion/include/private/svn_wc_private.h (revision 1899956) +++ subversion/include/private/svn_wc_private.h (working copy) @@ -2231,9 +2231,11 @@ const svn_version_t * svn_wc__min_supported_format_version(void); /** - * Set @a format to the format of the nearest parent working copy root of - * @a local_abspath in @a wc_ctx, or to the oldest format of any root stored - * there. If @a wc_ctx is empty, return the library's default format. + * Set @a *format_p and @a *store_pristines_p to the settings of the + * nearest parent working copy root of @a local_abspath in @a wc_ctx, + * or to settings of any root stored there, preferring the one with + * the oldest format. If @a wc_ctx is empty, return the library's + * default settings. * * Use @a scratch_pool for temporary allocations. * @@ -2240,16 +2242,18 @@ svn_wc__min_supported_format_version(void); * @since New in 1.15. */ svn_error_t * -svn_wc__format_from_context(int *format, -svn_wc_context_t *wc_ctx, -const char *local_abspath, -apr_pool_t *scratch_pool); +svn_wc__settings_from_context(int *format_p, + svn_boolean_t *store_pristines_p, + svn_wc_context_t *wc_ctx, + const char *local_abspath, + apr_pool_t *scratch_pool); /** * Ensure that an administrative area exist
Re: svn commit: r1899341 - in /subversion/branches/1.14.x: ./ STATUS subversion/libsvn_subr/io.c
Branko Čibej writes: > > 1) Previously, the check for EINVAL within the svn_io_file_flush_to_disk() > > function only happened in the #else block, so it did not affect the > > behavior on Windows. With the change, the check happens unconditionally > > on all platforms. > > You're right, I hadn't considered that. And Windows behaves sufficiently > differently that the EINVAL check as it stands is not appropriate. Would > you consider putting that entire check in an #ifdef an appropriate fix? Yes, I think it would be fine if we restored the pre-r1883355 way of handling EINVAL under an #ifdef. > > 2) Unless I am mistaken, this change replaces an fcntl(F_FULLFSYNC) with > > fdatasync() when the latter is supported. > > Depending platform yes, APR may pick fdatasync() over fcntl(F_FULLFSYNC). I think that since fdatasync() has lower integrity guarantees, starting to unconditionally use it in the existing API might be problematic. For example, the API users (ourselves and third-party) may implicitly rely on the current integrity guarantees and assume that metadata is flushed to disk. And in this case it's not limited to just the svn_io_file_flush_to_disk() function, because there are other parts of the public API that call svn_io_file_flush_to_disk() internally — such as svn_io_file_rename2() or svn_io_write_atomic2(). If the original intention of this change was to save I/O in cases where we would be fine with just flushing the file contents, then perhaps we could do something like svn_io_file_flush_to_disk2(…, svn_boolean_t sync_metadata) that uses apr_file_datasync() if sync_metadata is false — and incrementally start passing sync_metadata=FALSE on the calling sites where it's safe to do so. But if the original intention was to fully switch to APR functions, I am not too sure if we can do so, because both apr_file_datasync() and apr_file_sync() have lower integrity guarantees in some cases, compared to the previous code. (apr_file_datasync() — due to calling fdatasync() in some cases and apr_file_sync() due to always doing just an fsync() instead of F_FULLFSYNC when it's supported). Thanks, Evgeny Kotkov
Re: svn commit: r1899341 - in /subversion/branches/1.14.x: ./ STATUS subversion/libsvn_subr/io.c
svn-role writes: > Merge r1883355 from trunk: > > * r1883355 >Use the APR-1.4+ API for flushing file contents to disk. >Justification: > Reduce code duplication between APR and SVN. >Votes: > +1: brane, jun66j5, markphip … > + do { > +apr_err = apr_file_datasync(file); > + } while(APR_STATUS_IS_EINTR(apr_err)); > + > + /* If the file is in a memory filesystem, fsync() may return > + EINVAL. Presumably the user knows the risks, and we can just > + ignore the error. */ > + if (APR_STATUS_IS_EINVAL(apr_err)) > +return SVN_NO_ERROR; > + > + if (apr_err) > +return svn_error_wrap_apr(apr_err, > + _("Can't flush file '%s' to disk"), > + try_utf8_from_internal_style(fname, pool)); A few thoughts on this change: 1) Previously, the check for EINVAL within the svn_io_file_flush_to_disk() function only happened in the #else block, so it did not affect the behavior on Windows. With the change, the check happens unconditionally on all platforms. On Windows, APR_STATUS_IS_EINVAL() is defined as follows: #define APR_STATUS_IS_EINVAL(s) ((s) == APR_EINVAL \ || (s) == APR_OS_START_SYSERR + ERROR_INVALID_ACCESS \ || (s) == APR_OS_START_SYSERR + ERROR_INVALID_DATA \ || (s) == APR_OS_START_SYSERR + ERROR_INVALID_FUNCTION \ || (s) == APR_OS_START_SYSERR + ERROR_INVALID_HANDLE \ || (s) == APR_OS_START_SYSERR + ERROR_INVALID_PARAMETER \ || (s) == APR_OS_START_SYSERR + ERROR_NEGATIVE_SEEK) So with this change, all of these error codes are now going to be ignored, and the svn_io_file_flush_to_disk() function will return SVN_NO_ERROR, indicating the success of flushing the data to disk. Some of these error codes, such as ERROR_INVALID_HANDLE, are quite generic. I would think that this change opens a data corruption possibility in the following case: - A real error happens during FlushFileBuffers(). - It gets translated into one of the error codes above by the I/O stack. - The error is ignored in the implementation of svn_io_file_flush_to_disk(). - The caller of svn_io_file_flush_to_disk() interprets this situation as a successful data flush. - He continues to work under an assumption that the data was successfully flushed to disk, whereas in fact it was not. For example, by committing a repository transaction. - A power loss after the transaction commit causes the data to be lost or corrupted. 2) Unless I am mistaken, this change replaces an fcntl(F_FULLFSYNC) with fdatasync() when the latter is supported. So assuming that both operations are supported, the new behavior of the svn_io_file_flush_to_disk() function has weaker integrity guarantees than before, because fdatasync() does not ensure that metadata such as the last modification time is written to disk. I haven't done a thorough check if we rely on mtime being flushed to the disk in our code, but even if we don't, svn_io_file_flush_to_disk() is part of the public API, so altering its integrity guarantees in a patch release might be unexpected for its callers. Thoughts? (A small disclaimer: these changes have slipped past my attention until now, when I tried updating to 1.14.2. But better late than sorry…) Thanks, Evgeny Kotkov
Re: Pristines-on-demand: authz denied during textbase sync (#4888)
Julian Foad writes: > I suggest: > > - revert the patch I applied, as it's papering over the problem in an > incomplete way and so possibly causes more confusion than it fixes. > > - leave this issue open and come back to it later; it's an edge case not > part of common work flows. +1. Regards, Evgeny Kotkov
Re: Issue #525/#4892: on only fetching the pristines we really need
Julian Foad writes: > Conclusions: > > > It is certainly possible that we could modify "update" and the other > "online" operations, at least, and the previously "offline" operations > too if we want, to make them fetch pristines at point-of-use in this way. > > Such modifications are not trivial. There is the need to run additional > RA requests in between the existing ones, perhaps needing an additional > RA session to be established in parallel, or taking care with inserting > RA requests into an existing session. I think that this part has a lot of protocol constraints and hidden complexity. And things could probably get even more complex for merge and diff. Consider a bulk update report over HTTP, which is just a single response that has to be consumed in a streamy fashion. There is no request multiplexing, and fetching data through a separate connection is going to limit the maximum size of a pristine file that can be downloaded without receiving a timeout on the original connection. Assuming the default HTTP timeout of httpd 2.4.x (60 seconds) and 100 MB/s data transfer rate, the limit for a pristine size is going to be around 6 GB. This kind of problem probably isn't limited to just this specific example and protocol, considering things like that an update editor driver transfers the control at certain points (e.g., during merge) and thus cannot keep reading the response. When I was working on the proof-of-concept, encountering these issues stopped me from considering the approach with fetching pristines at the point of access as being practically feasible. That also resulted in the alternative approach, initially implemented on the `pristines-on-demand` branch. Going slightly off-topic, I tend to think that even despite its drawbacks, the current approach on the branch should work reasonably well for the MVP. To elaborate on that, let me first share a few assumptions and thoughts I had in mind at that point of time: 1) Let's assume a high-throughput connection to the server, 1 Gbps or better. With slower networks, working with large files is going to be problematic just by itself, so that might be thought as being out of scope. 2) Let's assume that a working copy contains a large number of blob-like files. In other words, there are thousands of 10-100 MB files, as opposed to one or two 50 GB files. 3) Let's assume that in a common case, only a small fraction of these files are modified in the working copy. Then for a working copy with 1,000 of 100 MB files and 10 modified files: A) Every checkout saves 100 GB of disk space; that's pretty significant for a typical solid state drive. B) Hydrating won't transfer more than 1 GB of data, or 10 seconds under an optimistic assumption. C) For a more uncommon case with 100 modified files, it's going to result in 10 GB of data transferred and about two minutes of time; I think that's still pretty reasonable for an uncommon case. So while the approach used in the proof-of-concept might be non-ideal, I tend to think it should work reasonably well in a variety of use cases. I also think that this approach should even be releasable in the form of an MVP, accompanied with a UI option to select between the two states during checkout (all pristines / pristines-on-demand) that is persisted in the working copy. A small final note is that I could be missing some details or other cases, but in the meantime I felt like sharing the thoughts I had while working on the proof-of-concept. Thanks, Evgeny Kotkov
Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode
Julian Foad writes: > Some people said it should be in wc.db. Evgeny wrote, "this sounds like > a property associated with a specificwc_id in the database. I would say > that this pretty much rules out optionsof storing it outside the wc.db." > But (Brane wrote) "WC_ID is hardcoded to 1 pretty much everywhere." > > What do you all make of this now? Is a new WC config file plausible? I tend to think that using a per-wc config file could potentially introduce more problems that it solves. A few examples: - Even if WC_ID is currently hardcoded to 1, I think that storing the new property without associating it with a WC_ID would probably circumvent this part of the working copy design. In other words, if at some point the database starts to use more than one WC_ID, we would probably have to do something about a config option that works on a per-WC_ID basis but isn't persisted as such. - A user-editable config option implies that it can be changed externally and that the working copy has to cope with such a possibility. This rules out a relatively simple approach where the property is set when a working copy is created and is kept immutable afterwards. (This approach also allows entirely skipping the textbase sync by seeing that the stored pristines-on-demand property is false. Compared to that, the approach with a config option implies having to do a more thorough check because some of the pristines might be missing.) - To change this option programmatically, one would have to update an existing config file, which could be a non-trivial technical task, considering things like persisting comments and whitespace trivia. Thanks, Evgeny Kotkov (On vacation for a couple of weeks, so might not be able to respond promptly.)
Re: A two-part vision for Subversion and large binary objects.
Julian Foad writes: > SEMANTICS > > The user can choose one mode, per WC, from a list of options that may > include: > > - off: as in previous versions, no checking, just assume all pristines > are present > - pristines-on-demand: fetch "wanted" pristines; discard others > - fetch-only: fetch any pristine that's absent; do not discard My two cents on this would be that it might be easier to start with an on/off property that gets stored when a working copy is created and doesn't change afterwards, at least for now. > Help please: Where should we properly store this setting in the WC? > > - in '.svn/entries' or '.svn/format'? > (Both currently contain a single line saying "12". We could add an > extra line, or in general N extra lines each with one setting, for example.) > - in a new file such as '.svn/pristines-on-demand'? > - in the wc.db somewhere? Thinking out loud, this sounds like a property associated with a specific wc_id in the database. I would say that this pretty much rules out options of storing it outside the wc.db. Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Julian Foad writes: > We could swap the scanning logic around to do the (quick) check for > missing pristines before deciding whether a (slower) file "stat" is > necessary. Positives: eliminates the redundant "stat" overhead which may > be significant in working trees containing many files. Negatives: some > re-work needed in the current implementation. > > Of these, the last one currently looks viable and useful. > > Does that one look promising to you? I might be missing something, but I don't yet see how we could save a stat(). Currently, a pristine is hydrated if and only if the corresponding working file is modified. Let's say we check if a pristine is hydrated beforehand. If we find out that pristine is dehydrated, we have to stat(), because if the file is modified, then we need to hydrate. If we find out that pristine is hydrated, we still have to stat(), because if the file is no longer modified, then we need to dehydrate. Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Julian Foad writes: > Scanning with 'stat' > > I'm concerned about the implementation scanning the whole subtree, > calling 'stat' on every file to determine whether the file is "changed" > (locally modified). This is done in svn_wc__textbase_sync() with its > textbase_walk_cb(). > > It does this scan on every sync, which is twice on every syncing > operation such as diff. > > Don't we already have an optimised scan for local modifications > implemented in the "status" code? Could we re-use this? In a few of my experiments, performance of textbase_sync() was more or less comparable to a status walk. So maybe it's not actually worthwhile spending time on improving this part, at least for now. Also, I tend to think that DRY doesn't really apply here, because a status walk and a textbase sync are essentially different operations that just happen to have something in common internally. For example, a textbase sync doesn't have to follow the tree structure and can be implemented with an arbitrarily ordered walk over NODES. > Premature Hydrating > > The present implementation "hydrates" (fetches missing pristines) every > file within the whole subtree the operation targets. This is done by > every major client operation calling svn_client__textbase_sync() before > and afterwards. > > That is pessimistic: the operation may not actually touch all these > files if limited in any way such as by > > - depth filtering > - other filtering (changelist, properties-only, ...) > - terminating early (e.g. output piped to 'head') > > That introduces all the fetching overhead for the given subtree as a > latency before the operation shows its results, which for something > small at the root of the tree such as "svn diff --depth=empty > --properties-only ./" may make a significant usability impact. > > Presumably we could add the depth and some other kinds of filtering to > the tree walk. But that will always leave terminating early, and > possibly other cases, sub-optimal. > > I would prefer a solution that defers the hydrating until closer to the > moment of demand. I think that fetching the pristine contents at the moment of demand is a particularly problematic concept to pursue, because it implies that there is a network request that can now happen at an unpredictable moment of time. So any operation that may access the pristine contents has to be ready for a network fetch. Compared to that, fetching the required pristines before the operation does not impose that kind of requirement on the existing code. Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Julian Foad writes: > Hello everyone. Thanks to sponsorship arranged by Karl, I'm able to work > on completing this. Hi Julian, It's great to have you on board! > Important Question: > > * Is the approach with a WC format bump the best approach? > > I see two options to consider: > > * with format bump like this, plus completing the multi-WC-format > branch work > * re-work the implementation of the concept, to not change the > declared format number or DB schema, but instead to query "is the file > present on disk" instead of the DB 'hydrated' flag > > For each of those options, > > * what is the outcome in terms of interoperability? > * how much work is involved? (or is the 2nd one impossible?) Between these two options, I don't think that it would be actually possible to have the pristines-on-demand functionality available without a format bump, as the existing clients working with the current format rely on the pristine contents always being available on disk. Personally, I tend to think that making a format bump and accompanying it with the multi-wc-format support should work pretty well from a user's standpoint. > Right now I'm reviewing this work and the long discussion about it and > I will come back with a summary and plan for your consideration, and no > doubt many questions, in the next few days. Perhaps, one way to start off would be to fix the temporary assumption that any working copy of the newer format always fetches the pristines on-demand. While the final way of configuring that undoubtedly is an important question to answer, there might be a way to make progress while keeping the configuration simple, at least for now: — Make the "are pristines being fetched on-demand" a property of the working copy — Introduce a new configuration option that controls this new behavior and is applied at the moment when a working copy is created I think that carrying these steps would bring the feature closer to its final shape, and would also mean that a lot of the important subsurface work (such as preserving the old behavior for existing working copies, running the tests for both configurations, etc.) is already in place, so that we wouldn't have to think about it further on. Thanks, Evgeny Kotkov
Re: [Regression in httpd 2.4.49] mod_dav: REPORT requests no longer return errors
Ruediger Pluem writes: > Does the attached patch solve your issue? It does appear to solve the problem with missing errors, thanks! I haven't checked that in detail, but I think there might be a discrepancy in how `err` is handled in the patch and for example when calling the method_precondition() hook. With the patch, `err` is checked even if all hooks DECLINE the operation. Not too sure if that's intended, because the variable could potentially contain an arbitrary value or a leftover from some previous call. Thanks, Evgeny Kotkov
[Regression in httpd 2.4.49] mod_dav: REPORT requests no longer return errors
Hi, I think that I have stumbled across a significant regression in httpd 2.4.49 where mod_dav has been changed in a way that makes it ignore the errors returned by a versioning provider during REPORT requests. I haven't checked that in detail, but I think that r1892513 [1] is the change that is causing the new behavior. Consider the new core part of the dav_method_report() function: … result = dav_run_deliver_report(r, resource, doc, r->output_filters, &err); switch (result) { case OK: return DONE; case DECLINED: /* No one handled the report */ return HTTP_NOT_IMPLEMENTED; default: if ((err) != NULL) { … handle the error } Assuming there are no deliver_report hooks, this code is going to call dav_core_deliver_report(), whose relevant part is as follows: … if (vsn_hooks) { *err = (*vsn_hooks->deliver_report)(r, resource, doc, r->output_filters); return OK; } … Here the "return OK" part skips the error handling on the calling site, even if there is an associated error returned in *err. In turn, this causes the following regression: - For a failed request where the server hasn't started sending the response, it is now going to erroneously send a 200 OK with an empty body instead of an error response with the appropriate HTTP status. - For a failed request where the server has started sending the response body, it is now going to handle it as if it was successfully completed instead of aborting the connection. The likely outcome of this is that the server is going to send a truncated payload with a 2xx status indicating success. This regression can cause unexpected behavior of the Subversion clients, for example in cases where the (ignored) error occurred due to a non-successful authorization check. Other DAV clients may be susceptible to some kinds of unexpected behavior as well. [1] https://svn.apache.org/r1892513 Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Evgeny Kotkov writes: > The current state is that the commit is going to fetch the original > text-base before sending the delta. > > It should be technically possible to implement a behavior where the commit > sends a delta if the local text-base is present, and a fulltext otherwise. This behavior where a commit does not fetch the text-bases, but only uses those that are available locally, can be enabled with the following patch: [[[ Index: subversion/libsvn_client/commit.c === --- subversion/libsvn_client/commit.c (revision 1892661) +++ subversion/libsvn_client/commit.c (working copy) @@ -729,7 +729,7 @@ svn_client_commit6(const apr_array_header_t *targe goto cleanup; cmt_err = svn_error_trace( -svn_client__textbase_sync(lock_root, TRUE, TRUE, +svn_client__textbase_sync(lock_root, FALSE, TRUE, ctx, iterpool)); if (cmt_err) goto cleanup; ]]] Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Mark Phippard writes: > > (Although as far as I can tell, we would have to extend the delta editor > > to allow us to send a fulltext against an existing file, and the > > optimization would only work for newer servers.) > > This is just from memory of a list conversation that probably happened > 10+ years ago now but ... if you use a generic WebDAV client with a > SVN server and you commit changes to an existing file the WebDAV > client is just sending the server the latest full text (no deltas) and > the server knows how to handle it. So theoretically some of what is > needed to support this sort of commit ought to already exist in our > server code. With a bit more thought, sending a fulltext even to existing servers indeed shouldn't be an issue, as it can happen in the form of a delta against the empty source. Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Mark Phippard writes: > Suppose I am using this feature for binaries, which I think is the > main use case. Using whatever tool I produce a modified version of my > binary file. At this point in time, there is nothing in the text-base > because I have not done any SVN operations other than checkout. Since > it is a binary, other than maybe running status my next likely command > is going to be commit. Are you saying that during the processing of > this commit the client will first fetch the original text base from > the server before doing the commit? That seems sub-optimal. Perhaps > you are only doing this for diff or revert which would make perfect > sense? The current state is that the commit is going to fetch the original text-base before sending the delta. It should be technically possible to implement a behavior where the commit sends a delta if the local text-base is present, and a fulltext otherwise. (Although as far as I can tell, we would have to extend the delta editor to allow us to send a fulltext against an existing file, and the optimization would only work for newer servers.) But at this point, I'm not too sure that we should make the commit special in that aspect. This optimization is only going to work reliably if the only things you do is checkout and commit. Because an update with modified files is going to fetch those text-bases by itself. And the update itself could happen in response to trying to do an out-of-date commit. > Aside from all of that ... do the text bases get cleaned up? For > example at the end of a successful commit does it get rid of that text > base that no longer are current? Yes. The text-base file corresponding to the new version of the file is not going to be written to the disk. The text-base file corresponding to the old version gets cleaned up if it is not referenced by any other remaining modified files. Thanks, Evgeny Kotkov
Re: A two-part vision for Subversion and large binary objects.
Karl Fogel writes: > 1) Make pristine text-base files optional. See issue #525 for > details. In summary: currently, every large file uses twice the > storage on the client side, and yet for most of these files > there's little benefit. They're usually not plaintext, so 'svn > diff' against the pristine base is pointless (unless you have some > specialized diff tool for the particular binary format, but that's > rare), and 'svn commit' likewise just sends up the whole working > file. The only thing a local base gets you is local 'svn revert', > which can be nice, but many of us would happily give it up for > large files to avoid the 2x local storage cost. A proof-of-concept implementation of one possible approach for making the text-bases optional is available on the branch: https://svn.apache.org/repos/asf/subversion/branches/pristines-on-demand While still being rough on the edges, it passes the whole test suite in my environment and seems to work quite well in practice. A few notes on the current state: - The implementation includes a bump of the working copy format. - Although this would most certainly have to change in the future, for now any working copy of the new format is considered to always have optional text-bases (for easier testing, etc.). - There is no UI and no configuration yet, just the first cut of the core part. The core idea is that we maintain the following invariant: only the modified files have their pristine text-base files available on the disk. - To avoid having to access the text-base, the "is the file modified?" check is performed by calculating the checksum of a file and comparing that to what's recorded in the working copy. - A text-base of the unmodified file is the file itself, appropriately detranslated. - To get into the appropriate state at the beginning of the operation, we walk through the current text-base info in the db and check if the corresponding working files are modified. The missing text-bases are fetched using the svn_ra layer. The operations also include a final step during which the no longer required text-bases are removed from disk. - The operations that don't need to access the text-bases (such as "svn ls" or the updated "svn st") do not perform this walk and do not synchronize the text-base state. The pros and cons of the approach could probably be summarized as follows: Pros: 1. A working copy without modifications does not store the text-bases, thus avoiding the 2x local storage cost that could be significant in case of large files. A working copy with modifications only stores the text-bases for the modified files. 2. The fetched text-bases for modified files are reused across operations. For example, calling "svn diff" twice is only going to fetch the required text-bases once. In case of a third-party client that makes such calls N times under the assumption that they are "local", the fetch is going to only happen once, rather than N times. 3. With a generic approach like this, we might have fewer issues with introducing any kind of new working copy functionality that depends on the text-bases. Cons: 1. The worst-case scenario for this approach (where every single file in the working copy is modified) would work by fetching all of the text-bases. And while that is going to take just as much storage as it currently does, there’s the added cost of the fetch itself. 2. The missing text-bases will be fetched before the operation. Fetching very large text-bases may take a significant amount of time and may have an impact on the user experience, although that might be alleviated by providing a UI of some kind. 3. Even without the fetch, the "synchronize the text-base state" part has some performance penalty. This penalty will only occur in the working copies with optional text-bases and only if the operation synchronizes the text-base state. The added cost is going to be of the order of two "svn st" calls for the target of the operation. Overall, I tend to think that this approach should work reasonably well even despite these cons, and that it is going to solve the issue in those cases where the 2x local storage requirement is a problem. So, how does that sound in general? If we were to try to get this to a production-ready state, it would probably make sense to also: 1. Complete the work on ^/subversion/branches/multi-wc-format so that the client would work with both the new and old working copy formats, for a seamless user experience and better compatibility. 2. For the new working copy format, incorporate a switch to a different checksum type without known collisions instead of SHA-1. 3. Fix the minor issues written down as TODOs in the code. Thanks, Evgeny Kotkov
Re: svn commit: r1886490 - in /subversion/trunk/subversion: include/private/ libsvn_subr/ libsvn_wc/ tests/cmdline/ tests/libsvn_subr/ tests/libsvn_wc/
Evgeny Kotkov writes: > My attempt to fix this is by making checkout stream data to both pristine > and the (projected) working file, so that the actual install would then > happen as just an atomic rename. For the record, I made a few additional changes, including a significant overhaul of the install stream's timestamp handling in r1886774 [1]. > Noting that this change only fixes "svn checkout", but not "svn export". > Export uses a separate implementation of the delta editor, and it should > be possible to update it in a similar way — but I'm leaving that for future > work for now. For this part with "svn export", r1886843 [2] should do the trick. While here, r1887108 [3] begins using the new file writer utility also when exporting from an existing working copy. [1] https://svn.apache.org/r1886774 [2] https://svn.apache.org/r1886843 [3] https://svn.apache.org/r1887108 Thanks, Evgeny Kotkov
Re: svn commit: r1886490 - in /subversion/trunk/subversion: include/private/ libsvn_subr/ libsvn_wc/ tests/cmdline/ tests/libsvn_subr/ tests/libsvn_wc/
Daniel Shahaf writes: > Thanks for checking. How about adding it somewhere, then? Yes, that will probably be useful. Would you have an opportunity to do so? Thanks, Evgeny Kotkov
Re: svn commit: r1886490 - in /subversion/trunk/subversion: include/private/ libsvn_subr/ libsvn_wc/ tests/cmdline/ tests/libsvn_subr/ tests/libsvn_wc/
Daniel Shahaf writes: > > Noting that this change only fixes "svn checkout", but not "svn export". > > Export uses a separate implementation of the delta editor, and it should > > be possible to update it in a similar way — but I'm leaving that for > > future work for now. > > Are this problem and its proposed solution documented in a jira ticket? No, I don't think so. (Apparently, SVN-3264 [1] describes the checkout case, but I haven't done a thorough search to see if export is mentioned somewhere.) [1] https://issues.apache.org/jira/browse/SVN-3264 Regards, Evgeny Kotkov
Re: svn commit: r1886490 - in /subversion/trunk/subversion: include/private/ libsvn_subr/ libsvn_wc/ tests/cmdline/ tests/libsvn_subr/ tests/libsvn_wc/
Evgeny Kotkov writes: > URL: http://svn.apache.org/viewvc?rev=1886490&view=rev > Log: > In the update editor, stream data to both pristine and working files. ... > Several quick benchmarks: > > - Checking out subversion/trunk over https:// > > Total time: 3.861 s → 2.812 s >Read IO: 57322 KB → 316 KB > Write IO: 455013 KB → 359977 KB > > - Checking out 4 large binary files (7.4 GB) over https:// > > Total time: 91.594 s → 70.214 s >Read IO: 7798883 KB → 19 KB > Write IO: 15598167 KB → 15598005 KB Hey everyone, Here's an improvement I have been working on recently. Apparently, the client has an (implicit) limit on the size of directories that can be safely checked out over HTTP without hitting a timeout. The problem is that when the client installs the new working files, it does so in a separate step. This step happens per-directory and involves copying and possibly translating the pristine contents into new working files. While that happens, nothing is read from the connection. So the amount of work that can be done without hitting a timeout is limited. Assuming the default HTTP timeout = 60 seconds of httpd 2.4.x and a relatively fast disk, that puts the limit at around 6 GB for any directory. Not cool. My attempt to fix this is by making checkout stream data to both pristine and the (projected) working file, so that the actual install would then happen as just an atomic rename. Since we now never stop reading from the connection, the timeouts should no longer be an issue. The new approach also has several nice properties, such as not having to re-read the pristine files, not interfering with the network-level buffering, TCP slow starts, and etc. I see that it reduces the amount of both read and write I/O during all checkouts, which should give a mild overall increase of how fast the checkouts work. Noting that this change only fixes "svn checkout", but not "svn export". Export uses a separate implementation of the delta editor, and it should be possible to update it in a similar way — but I'm leaving that for future work for now. Thanks, Evgeny Kotkov
Re: Bug review of the week
Nathan Hartman writes: > SVN-4588 "Item index too large in revision" is marked as Unresolved, but > from my reading of all the comments, it seems the root cause of the problem > was not restarting httpd after dumping and loading, leaving stale > information in cache. > > The last comment mentions a similar report on Stack Overflow. That one > magically fixed itself by rebooting. See: > https://stackoverflow.com/questions/33904618/how-to-increase-l2p-page-size/33938501 > > So... is there anything else to do for this one? There is an option of making the filesystem instance IDs part of the cache keys. Doing so would fix the case described in SVN-4588 and allow other operations that happen through the command-line tools, such as "svnadmin hotcopy" to work properly without the server restart. See https://lists.apache.org/thread.html/6db0186496a46c9823a752b377e369eb4361c8a42e62fc7e601453c6%401440460036%40%3Cdev.subversion.apache.org%3E Thanks, Evgeny Kotkov
Re: svn st shows missing for root folder
Stefan Kueng writes: > Using svn 1.13.0 on Windows: > > SUBST G:\ D:\Development\TestWC > G: > svn st . > > ! . > ? Foo > ? src\nonversioned > > As you can see, the root folder doesn't show the correct status. > This showed the correct status with 1.12.x. On my side, this issue *does not* reproduce with Subversion 1.13.0 built against APR 1.6.x, but reproduces with the TortoiseSVN 1.13.1 command-line binaries. I see that TortoiseSVN 1.13 has recently switched to APR 1.7.0, so that may be causing the issue: https://osdn.net/projects/tortoisesvn/scm/svn/commits/28674 Among the changes in APR 1.7.0, this one could be responsible for the faulty behavior, since it changes how apr_stat() works for reparse points: https://svn.apache.org/r1855950 Regards, Evgeny Kotkov
Re: --normalize-probs doesn't do its thing
Evgeny Kotkov writes: > Apparently, there is a bug in the implementation of --normalize-props > that only makes it automatically fix line endings in the values of the > revision properties, but not node properties (as per the code around > load-fs-vtable.c: set_node_property()). I think that this issue should be fixed by https://svn.apache.org/r1868203 (Will try to nominate this fix to the stable branches, once I get some spare time for that.) Regards, Evgeny Kotkov
Re: svn-crash-log20190911192859
Johan Corveleyn writes: > I'm not sure what the reason could be. Perhaps the dumpfile itself is > huge, or has something special ... Or maybe the memory footprint is > normal, but the machine you're running it on is low on available > memory. Apparently, our svn_stream_readline() implementation may cause excessive memory allocations if the input contains nul bytes. While I can't say for sure, this could be the likely cause of the reported problem. For example, the dump file could have been resaved in UTF-16, or something like that. I committed r1866950 that should fix this specific problem: https://svn.apache.org/r1866950 The other possible, although less likely reason for this problem is that the dump file itself is a large binary file without any \n characters. I committed r1866951 that should make the parser more resilient to such files: https://svn.apache.org/r1866951 Full solution for the latter part, I think, would require introducing and implementing something like svn_stream_readline2() with a `limit` parameter, but this change should at least ensure that the parser doesn't choke in the described case with a binary file and other similar situations. I will try to nominate both of these fixes to our stable branches, once I have some time for that. Thanks, Evgeny Kotkov
Re: svn commit: r1865522 - /subversion/trunk/subversion/libsvn_subr/sqlite3wrapper.c
Daniel Shahaf writes: > Would it break anything if some application that uses libsvn had > its own copy of SQLite that was compiled with another > SQLITE_DEFAULT_MEMSTATUS value? I don't see anything about that > in the linked page. I don't think so, because our amalgamated SQLite has all its symbols marked as static and is not exposed to anyone else. Also, this compile option only changes the default value of the corresponding runtime option (SQLITE_CONFIG_MEMSTATUS), meaning that it can still be changed later with sqlite3_config(). Thanks, Evgeny Kotkov
Re: svn commit: r1865523 - /subversion/trunk/subversion/libsvn_subr/sqlite3wrapper.c
Daniel Shahaf writes: > Quoting https://sqlite.org/compile.html#_options_to_omit_features: > > Important Note: The SQLITE_OMIT_* options may not work with > the amalgamation. SQLITE_OMIT_* compile-time options usually > work correctly only when SQLite is built from canonical > source files. > > and a bit later: > > Also, not all SQLITE_OMIT_* options are tested. Some > SQLITE_OMIT_* options might cause SQLite to malfunction and/or > provide incorrect answers. > > I think we should: > > - Revert r1865523 I see that this option has been supported from its introduction in 2010, and that it appears to be covered in tests. Also, since it only affects the journaling mode, apparently it does not participate in the parser/keyword generation, which is listed as the reason of not using it with amalgamation: https://github.com/sqlite/sqlite/blob/cc3f3d1f055/src/parse.y https://github.com/sqlite/sqlite/blob/cc3f3d1f055/tool/mkkeywordhash.c https://github.com/sqlite/sqlite/blob/cc3f3d1f055/test/attach4.test#L82 https://github.com/sqlite/sqlite/commit/bc6b8d73592ad72e674b10152012170a01c31c62 Thanks, Evgeny Kotkov
Re: svn commit: r1863018 - /subversion/trunk/subversion/libsvn_subr/win32_crypto.c
Branko Čibej writes: > Uh. I don't think so. > > First of all, the reference to Chromium source is a red herring ... that > code disables CRL/OCSP checks *if some caller required it to*. You'll > find that browsers do, indeed, check CRL or OCSP info, if it's available. I went through an additional round of fact-checking to ensure that Chromium browsers have never been doing online revocation checks by default. Back in 2014, this could be controlled by a user preference (disabled by default), but since then the UI option was removed and the current state is that such checks only operate from cache: https://codereview.chromium.org/250583004/diff/20001/chrome/browser/net/ssl_config_service_manager_pref.cc https://chromium.googlesource.com/chromium/src/+/refs/tags/77.0.3852.2/chrome/browser/ssl/ssl_config_service_manager_pref.cc#243 On Windows, this can be additionally verified by examining the CryptoAPI log where I can see the certificate chains being constructed with the CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY flag. > Your change disables all online verification, regardless, and that seems > quite wrong. This change affects the Win32 callback used to *override specific* certificate validation failures, namely SVN_AUTH_SSL_UNKNOWNCA. So in the current design, the revocation checks are only supplemental, but not authoritative. As an example, if the Serf/OpenSSL layer thinks that the certificate is OK by itself (without performing a revocation check), this callback is not going to be called at all. Since all infrastructure-related failures, stalls and timeouts during online revocation checks appear to be fairly common, combined, this means that we are putting the users through all these potential problems, but in the end the whole checking is still non-exhaustive. Perhaps, the actual result of that is just adding to the perception that Subversion is slow in general. With that in mind, I tend to think that it would be much better to just stick to the locally available data for this particular callback, that is currently used to only resolve a subset of certificate validation failures. > So ... -1, please revert Will certainly do, unless this new data alters your opinion on the topic. Thanks, Evgeny Kotkov
Re: API to get revision size on disk
Julian Foad writes: > I have reviewed and tested it. Can you commit? Fixed the typo and committed in https://svn.apache.org/r1857435, thanks! > * We could keep the existing private API declarations in > include/private/svn_fs_fs_private.h as well as the new ioctl versions of > them, rather than moving the existing declarations to a local header. It > would reduce the patch size, and some people might like the ability to > link to libsvn_fs_fs directly and and call those functions directly. But > I don't feel strongly about this; either way is fine for me. For now I chose to move the declarations to libsvn_fs_fs/fs_fs.h, to avoid having two parallel interfaces and to only export the version that works through the FS loader. Thanks, Evgeny Kotkov
Re: API to get revision size on disk
Julian Foad writes: > Similar consideration applies to the API. I think this could be a libsvn_fs > API rather than only in libsvn_fs_fs. Agreed? Maybe, as an alternative to abstracting and promoting this to a public FS API, we could add a way of performing backend-specific I/O operations through an API like svn_fs_ioctl()? See the attached patch that implements it for existing fsfs stats, dump index and load index operations, and, as I suppose, may also be extended for the "get revision size/stats" operation. This concept and the patch serve two purposes: - They allow us to properly expose FS-specific details and invoke FS-specific operations everywhere without necessarily promoting them into a proper public API (the ioctl code itself may be made either public or private, depending on the requirements). - They solve an potential dependency/linking problem where tools like `svnfsfs` work through the libsvn_fs's loader, but also have to load and call private APIs from libsvn_fs_fs thus ignoring the loader. The latter part may potentially cause issues with the global shared state, etc. With the patch, all such operations always go through the FS loader. Regards, Evgeny Kotkov Index: build.conf === --- build.conf (revision 1857216) +++ build.conf (working copy) @@ -442,7 +442,7 @@ description = Subversion FSFS Repository Manipulat type = exe path = subversion/svnfsfs install = bin -libs = libsvn_repos libsvn_fs libsvn_fs_fs libsvn_delta libsvn_subr apriconv apr +libs = libsvn_repos libsvn_fs libsvn_delta libsvn_subr apriconv apr # # @@ -831,7 +831,7 @@ type = exe path = subversion/tests/libsvn_fs_fs sources = fs-fs-private-test.c install = test -libs = libsvn_test libsvn_fs libsvn_fs_fs libsvn_delta +libs = libsvn_test libsvn_fs libsvn_delta libsvn_repos libsvn_subr apriconv apr msvc-force-static = yes Index: subversion/include/private/svn_fs_fs_private.h === --- subversion/include/private/svn_fs_fs_private.h (revision 1857216) +++ subversion/include/private/svn_fs_fs_private.h (working copy) @@ -255,22 +255,6 @@ typedef struct svn_fs_fs__stats_t apr_hash_t *by_extension; } svn_fs_fs__stats_t; - -/* Scan all contents of the repository FS and return statistics in *STATS, - * allocated in RESULT_POOL. Report progress through PROGRESS_FUNC with - * PROGRESS_BATON, if PROGRESS_FUNC is not NULL. - * Use SCRATCH_POOL for temporary allocations. - */ -svn_error_t * -svn_fs_fs__get_stats(svn_fs_fs__stats_t **stats, - svn_fs_t *fs, - svn_fs_progress_notify_func_t progress_func, - void *progress_baton, - svn_cancel_func_t cancel_func, - void *cancel_baton, - apr_pool_t *result_pool, - apr_pool_t *scratch_pool); - /* A node-revision ID in FSFS consists of 3 sub-IDs ("parts") that consist * of a creation REVISION number and some revision- / transaction-local * counter value (NUMBER). Old-style ID parts use global counter values. @@ -325,34 +309,37 @@ typedef svn_error_t * void *baton, apr_pool_t *scratch_pool); -/* Read the P2L index for the rev / pack file containing REVISION in FS. - * For each index entry, invoke CALLBACK_FUNC with CALLBACK_BATON. - * If not NULL, call CANCEL_FUNC with CANCEL_BATON from time to time. - * Use SCRATCH_POOL for temporary allocations. - */ -svn_error_t * -svn_fs_fs__dump_index(svn_fs_t *fs, - svn_revnum_t revision, - svn_fs_fs__dump_index_func_t callback_func, - void *callback_baton, - svn_cancel_func_t cancel_func, - void *cancel_baton, - apr_pool_t *scratch_pool); +typedef struct svn_fs_fs__ioctl_get_stats_input_t +{ + svn_fs_progress_notify_func_t progress_func; + void *progress_baton; +} svn_fs_fs__ioctl_get_stats_input_t; +typedef struct svn_fs_fs__ioctl_get_stats_output_t +{ + svn_fs_fs__stats_t *stats; +} svn_fs_fs__ioctl_get_stats_output_t; -/* Rewrite the respective index information of the rev / pack file in FS - * containing REVISION and use the svn_fs_fs__p2l_entry_t * array ENTRIES - * as the new index contents. Allocate temporaries from SCRATCH_POOL. - * - * Note that this becomes a no-op if ENTRIES is empty. You may use a zero- - * sized empty entry instead. - */ -svn_error_t * -svn_fs_fs__load_index(svn_fs_t *fs, - svn_revnum_t revision, - apr_array_header_t *entries, - apr_pool_t *scratch_pool); +SVN_FS_DECLARE_IOCTL_CODE(SVN_FS_FS__IOCTL_GET_STATS, SVN_FS_TYPE
Re: --normalize-probs doesn't do its thing
Thorsten Goetzke writes: > uls/trunk/uls/TestAutomation/UMS_Testmanagement/02_Implementation/java/TestManagementUI > ...svnadmin: E125005: A property with invalid line ending found in > dumpstream; consider using --normalize-props while loading. > > Are there some catches on how to use --normalize-probs? Apparently, there is a bug in the implementation of --normalize-props that only makes it automatically fix line endings in the values of the revision properties, but not node properties (as per the code around load-fs-vtable.c: set_node_property()). I'll put fixing that on my TODO list. Regards, Evgeny Kotkov
Re: svn commit: r1854072 - in /subversion/trunk/subversion: libsvn_subr/io.c tests/libsvn_subr/io-test.c
Branko Čibej writes: > @@ -4493,6 +4571,12 @@ svn_io_dir_remove_nonrecursive(const cha > >SVN_ERR(cstring_from_utf8(&dirname_apr, dirname, pool)); > > + /* On Windows, a read-only directory cannot be removed. */ > +#if defined(WIN32) || defined(__OS2__) > + SVN_ERR(io_set_readonly_flag(dirname_apr, dirname, > + FALSE, FALSE, FALSE, pool)); > +#endif > + >status = apr_dir_remove(dirname_apr, pool); Would it be feasible for us to only attempt to remove the read-only attribute after receiving an error? (along the lines of the attached patch) The reason is that getting and setting file attributes usually results in CreateFile() syscalls, whereas opening files and the I/O-related syscalls are not cheap on Win32 in general. So changing the directory attributes per every remove could increase the amount of I/O operations and maybe slow down the cases where we have to remove multiple directories, with the post-commit transaction directory cleanup being an example of such case. Thanks, Evgeny Kotkov Index: subversion/libsvn_subr/io.c === --- subversion/libsvn_subr/io.c (revision 1854090) +++ subversion/libsvn_subr/io.c (working copy) @@ -4571,12 +4571,6 @@ svn_io_dir_remove_nonrecursive(const char *dirname SVN_ERR(cstring_from_utf8(&dirname_apr, dirname, pool)); - /* On Windows, a read-only directory cannot be removed. */ -#if defined(WIN32) || defined(__OS2__) - SVN_ERR(io_set_readonly_flag(dirname_apr, dirname, - FALSE, FALSE, FALSE, pool)); -#endif - status = apr_dir_remove(dirname_apr, pool); #ifdef WIN32 @@ -4583,6 +4577,16 @@ svn_io_dir_remove_nonrecursive(const char *dirname { svn_boolean_t retry = TRUE; +if (APR_STATUS_IS_EACCES(status) || APR_STATUS_IS_EEXIST(status)) + { +/* Set the destination directory writable because Windows will not + allow us to delete when path is read-only. */ +SVN_ERR(io_set_readonly_flag(dirname_apr, dirname, + FALSE, FALSE, TRUE, pool)); + +status = apr_dir_remove(dirname_apr, pool); + } + if (status == APR_FROM_OS_ERROR(ERROR_DIR_NOT_EMPTY)) { apr_status_t empty_status = dir_is_empty(dirname_apr, pool);
Re: [PATCH] A test for "Can't get entries" error
Daniel Shahaf writes: > > I'm glad you included a FS-level test for it. In case anyone is > > interested, I attach a patch that backports the initial RA-level test to > > 1.8. 1.9 and 1.10. > > Any reason not to commit that one, too? Personally, I think that having another test on the RA layer for this case is unnecessary and would just increase the maintenance costs — since this is an FS layer bug and given that we already have the (more or less) minimal regression test in terms of the amount of the FS API calls. Regards, Evgeny Kotkov
Re: svn commit: r1847572 - in /subversion/trunk/subversion: libsvn_fs_fs/tree.c tests/libsvn_fs/fs-test.c
Branko Čibej writes: > > + else > > +return err_not_directory(root, directory, pool); > > +} > > } > > } > > > > Would be nice to wrap this return value in svn_error_trace(), as that > makes it easier to find where the error came from. Makes sense, committed in https://svn.apache.org/r1847596 Thanks, Evgeny Kotkov
Re: [PATCH] A test for "Can't get entries" error
Dmitry Pavlenko writes: > I've run into an error: when performing 2 specially constructed updates one > after another within the same session, SVN fails with > > $ ./ra-test 15 > svn_tests: E160016: Can't get entries of non-directory > XFAIL: ra-test 15: check that there's no "Can't get entries" error > > error. I believe these updates constructed that way are valid, so the problem > is somewhere in FSFS code. It's also interesting that if these updates are > run separately (e.g. by adding "if (FALSE)" to one or another), they succeed. Apparently, this behavior is caused by a problem in the specific optimization within the FSFS open_path() routine. I constructed an FS regression test and committed it and the fix in: https://svn.apache.org/r1847572 (I'll try to nominate it for backport once I get some time.) Thanks, Evgeny Kotkov
[PATCH] Proof of concept of the better-pristines (LZ4 + storing small pristines as BLOBs) (Was: Re: svn commit: r1843076)
Branko Čibej writes: > Still missing is a mechanism for the libsvn_wc (and possibly > libsvn_client) to determine the capabilities of the working copy at > runtime (this will be needed for deciding whether to use compressed > pristines). FWIW, I tried the idea of using LZ4 to compress the pristines and storing small pristines as blobs in the `PRISTINE` table. I was particularly interested in how such change would affect the performance and what kind of obstacles would have to be dealt with. In the attachment you will find a more or less functional implementation of this idea that might be useful to some extent. The patch is a proof of concept: it doesn't include the WC compatibility bits and most certainly doesn't have everything necessary in place. But in the meanwhile, I think that is might give a good approximation of what can be expected from the approach. The patch applies to the `better-pristines` branch. A couple of observations: - As expected, the combined size of the pristines is halved when the data itself is compressible, thus making the working copy 25% smaller. - A variety of the callers currently access the pristine contents by reading the corresponding files. That doesn't work in case of compressed pristines or pristines stored as BLOBs. I think that ideally we would want to use streams as much as possible, and only spill the uncompressed pristine contents to temporary files when we need to pass them to external tools, etc.; and that temporary files need to be backed by a work queue to avoid leaving them in place in case of an application crash. The patch does that kind of plumbing to some extent, but that part of the work is not complete. The starting point is around wc_db_pristine.c: svn_wc__db_pristine_get_path(). - Using BLOBs to store the pristine contents didn't have a measurable impact on the speed of the WC operations such as checkout in my experiments on Windows. These experiments were not comprehensive, and also I didn't run the tests on *nix. - There's also the deprecated svn_wc_get_pristine_copy_path() public API that would require plumbing to maintain compatibility; the patch performs it by spilling the pristine contents result into a temporary file whose lifetime is attached to the `result_pool`. (I probably won't be able to continue the work on this patch in the nearby future; posting this in case it might be useful.) Thanks, Evgeny Kotkov Index: notes/wc-ng/pristine-store === --- notes/wc-ng/pristine-store (revision 1844566) +++ notes/wc-ng/pristine-store (working copy) @@ -1,6 +1,7 @@ This spec defines how the Pristine Store works (part A) and how the WC uses it (part B). +TODO: Update A. THE PRISTINE STORE = Index: subversion/include/private/svn_io_private.h === --- subversion/include/private/svn_io_private.h (revision 1844566) +++ subversion/include/private/svn_io_private.h (working copy) @@ -119,14 +119,6 @@ svn_error_t * svn_stream__install_delete(svn_stream_t *install_stream, apr_pool_t *scratch_pool); -/* Optimized apr_file_stat / apr_file_info_get operating on a closed - install stream */ -svn_error_t * -svn_stream__install_get_info(apr_finfo_t *finfo, - svn_stream_t *install_stream, - apr_int32_t wanted, - apr_pool_t *scratch_pool); - /* Internal version of svn_stream_from_aprfile2() supporting the additional TRUNCATE_ON_SEEK argument. */ svn_stream_t * Index: subversion/include/private/svn_subr_private.h === --- subversion/include/private/svn_subr_private.h (revision 1844566) +++ subversion/include/private/svn_subr_private.h (working copy) @@ -591,6 +591,12 @@ svn__decompress_lz4(const void *data, apr_size_t l svn_stringbuf_t *out, apr_size_t limit); +svn_stream_t * +svn_stream__lz4_compress(svn_stream_t *stream, apr_pool_t *pool); + +svn_stream_t * +svn_stream__lz4_decompress(svn_stream_t *stream, apr_pool_t *pool); + /** @} */ /** Index: subversion/libsvn_subr/compress_lz4.c === --- subversion/libsvn_subr/compress_lz4.c (revision 1844566) +++ subversion/libsvn_subr/compress_lz4.c (working copy) @@ -23,9 +23,14 @@ #include +#define APR_WANT_BYTEFUNC +#include + +#include "private/svn_string_private.h" #include "private/svn_subr_private.h" #include "svn_private_config.h" +#include "svn_pools.h" #ifdef SVN_INTERNAL_LZ4 #include "lz4/lz4internal.h" @@ -142,3 +147,322 @@ svn_lz4__runtime_version(void) {
Re: Possible bug in the tree conflict resolver: "Accept incoming deletion" option doing nothing for a locally deleted file
Stefan Sperling writes: >> Should I file an issue or add a regression test for this problem? > > Yes please. > > Thanks! Issue: https://issues.apache.org/jira/browse/SVN-4739 Regression test committed in https://svn.apache.org/r1830083 Thanks, Evgeny Kotkov
Possible bug in the tree conflict resolver: "Accept incoming deletion" option doing nothing for a locally deleted file
Hi everyone, I think that I might have stumbled across an issue with the new tree conflict resolver in case of a delete-vs-delete conflict. Below is a simple PowerShell- based reproduction script that adds a file (r1), removes it (r2), updates to the previous revision, removes the file in the working copy (svn rm), and attempts an update to r2: rm test -Recurse -Force -ErrorAction Ignore mkdir test svnadmin create test/repo $url = 'file:///{0}/test/repo' -f (pwd) -replace '\\', '/' svn co $url test/wc echo content > test/wc/file.txt svn add test/wc/file.txt svn ci test/wc -m"r1" svn rm test/wc/file.txt svn ci test/wc -m"r2" svn up test/wc -r1 svn rm test/wc/file.txt svn up test/wc svn st test/wc In this case, the tree conflict resolver offers the "Accept incoming deletion" option for file.txt, but choosing it does nothing, and leaves the tree conflict unresolved, which I think is unexpected: > svn up Updating 'test\wc': C test\wc\file.txt At revision 2. Summary of conflicts: Tree conflicts: 1 Searching tree conflict details for 'test\wc\file.txt' in repository: Checking r2... done Tree conflict on 'test\wc\file.txt': File updated from r1 to r2 was deleted by Evgeny.Kotkov in r2. A deleted file was found in the working copy. Select: (p) Postpone, (r) Mark as resolved, (a) Accept incoming deletion, (h) Help, (q) Quit resolution: a Summary of conflicts: Tree conflicts: 1 > svn st ! C test\wc\file.txt > local file delete, incoming file delete or move upon update Summary of conflicts: Tree conflicts: 1 Should I file an issue or add a regression test for this problem? Thanks, Evgeny Kotkov
Re: svnadmin load error with 1.10 that seems to be fsfs cache related
Philip Martin writes: > That works as expected, but vary the cache size of the load process and > it fails. The load succeeds with -M64 and smaller but fails with -M65 > and larger: [...] Maybe this behavior could be related to the cache size threshold in svnadmin that enables the block read feature in fsfs (currently set to 64 MB, as per svnadmin.c:BLOCK_READ_CACHE_THRESHOLD). Regards, Evgeny Kotkov
Re: Rolling 1.10.0-rc2
Julian Foad writes: > Please review the 'STATUS' file and report here if there's anything else I > should take into account. I have found a bug (my fault) in the new svn_txdelta_to_svndiff_stream() API that can result in failing commits over http://. Should be fixed by https://svn.apache.org/r1826747 and proposed for backport to 1.10.x. Thanks, Evgeny Kotkov
Re: 1.10 release notes: protocol changes, LZ4, etc.
Julian Foad writes: >> I incorporated this tweak and staged the changes in r1825865. > > Lovely. I merged it all (includes some other people's changes) to 'publish' > in r1825869. Thanks! > To merge from staging to publish I used "svn merge > ^/subversion/site/staging/docs publish/docs" to keep the "root" of the merge > as high as possible while omitting the longer-term staged changes in > 'staging/quick-start.html. > > I encourage you to commit follow-ups straight to 'publish' rather than > 'staging' unless you want (yourself or others) to review it. Up to you, just > trying to remove friction. Personally, I prefer always committing to staging first, as I think that it helps to avoid unwanted mistakes which can otherwise slip through to the live site. Assuming no objections or tweak suggestions, I'll merge the most recent relnotes follow-ups to 'publish' somewhere later today or tomorrow. Thanks, Evgeny Kotkov
Re: 1.10 release notes: protocol changes, LZ4, etc.
Julian Foad writes: > +1. > > (Nit: English idiom is "[is still] subject to".) I incorporated this tweak and staged the changes in r1825865. Thanks, Evgeny Kotkov
Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)
Stefan Sperling writes: > I'd rather ship 1.10.0 at the prospected release date followed closely > by 1.10.1 to fix bugs such as these, than delay general access to 1.10.0 > even further. While I do not have significant objections against such plan, I find the idea of shipping a performance feature that causes a massive slowdown instead of an improvement somewhat controversial. (In other words, I am -0 for that.) > You may not have realized this, but I have been waiting for 1.10.0 to > happen for *over a year* https://svn.haxx.se/dev/archive-2017-01/0043.shtml > For all this time, I have wanted the conflict resolver to get real world > exposure because I need feedback from users out there to improve it. > I kept mostly quiet because I didn't want to push too hard for this > release all by myself because of the relatively high share of burden > this would imply. So I waited for activity from the community to make > it happen as a true collective effort. Not too sure about how this is connected to the soak period and to the release process — speaking of which, I would say that your e-mail may discourage people from reporting issues during the soak period. > If this one bug really bothers you enough to hold the planned release back > it makes me wonder why you didn't push for a fix much earlier. We have had > plenty of time. I haven't been and am not pushing for a fix. Rather than that, I have just included the additional information about the problem with a comment that it might be viable to look into before the GA release. Moreover, I reported the issue at the very moment I found it with an edge-case reproduction. Once I was asked to bisect for a specific revision, I should have probably stated that I won't have time to do that. But I have been thinking that I would be able to find some. When I stumbled across it again, I found the revision and the simple reproduction — but as it seems, this hasn't been the most appropriate time for including these new details. Putting all that aside, I wouldn't say that it is productive to discuss issues in such way. In my opinion, we should be doing it the other way around by actually encouraging reports of various problems during the soak period. Regards, Evgeny Kotkov
Re: Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)
Stefan Fuhrmann writes: > Would it be possible for you to bisect this to find the offending revision? > My random guess would be that in the context of mod_dav_svn, we might use > an unsuitable pool for authz caching. While looking through the various 1.10-related topics, I remembered about this issue. I have been able to narrow it down in my environment to https://svn.apache.org/r1778923 (Ensure that even long-lived DAV connections use up-to-date authz.) Perhaps, a simpler reproduction script would be to issue an 'svn log' for a medium-sized repository. In my environment, doing so for the trunk of TortoiseSVN's repository with 25,000 revisions causes the httpd process to consume up to a 1 GB of RAM while processing the request. Overall, the log takes around 11 seconds instead of 2, compared to 1.9.7. Unless I am missing something, this might be worth considering before the 1.10 GA release. Thanks, Evgeny Kotkov
Re: 1.10 release notes: protocol changes, LZ4, etc.
Julian Foad writes: >> - On the server-side, SVNCompressionLevel 0 can be used to disable >> compression altogether. The special value of SVNCompressionLevel 1 forces >> the use of LZ4 compression for clients that support it. All other values >> result in negotiating the use of zlib compression with the respective >> compression level, unless the compression is disabled on the client. > > There's a small disagreement with the table there: all other values result > in negotiating either zlib or LZ4. Indeed. Perhaps, we could rephrase it like this? [...] All other values result in preferring zlib compression with the respective compression level. Note that the negotiated algorithm is still a subject to the client's configuration. For example, even if the server is configured to prefer zlib compression over LZ4, a client may still negotiate the use of LZ4 compression when its http-compression option is set to auto. >> - On the client-side, setting http-compression to either yes or no will >> disable or enable compression that is then negotiated based on the >> server's >> configuration. The default value of auto will result in preferring LZ4 >> compression over low latency networks and zlib compression otherwise. > > Can we link to the reference documentation for the client and server > options, so the reader (me) can cross-check the definition and look for > related settings? At the time being, the documentation for these options is quite scarce and doesn't dive into the details of which compression algorithm is negotiated behind the scenes. The reasoning behind this was to avoid overwhelming the users with technical and implementation details and to have a ground for future extension by not promising too much. This makes me think that linking to these docs from the release notes probably wouldn't be too useful: https://svn.apache.org/viewvc/subversion/trunk/subversion/mod_dav_svn/mod_dav_svn.c?view=markup&pathrev=1820778#l1392 https://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/config_file.c?view=markup&pathrev=1820778#l1151 For the purpose of cross-checking the description, the following locations in mod_dav_svn and ra_serf may prove useful: mod_dav_svn/repos.c:negotiate_encoding_prefs() libsvn_ra_serf/util.c:svn_ra_serf__setup_svndiff_accept_encoding() libsvn_ra_serf/commit.c:negotiate_put_encoding() Thanks, Evgeny Kotkov
Re: 1.10 release notes: protocol changes, LZ4, etc.
Julian Foad writes: > I added a skeleton "protocol changes" section to the release notes: > > http://subversion.a.o.local/docs/release-notes/1.10.html#protocol-changes > > Here is a starting point. Can anyone help fill in the details? [...] > Thoughts? Help? Maybe we could start with adding a subsection about that describes using LZ4 for http:// and svn:// to the users. If needed, we could then include the technical details about the negotiation into the "Client-Server Protocol Changes" and cross-link these two sections. For the http:// part, we could probably do something along the lines of the attached patch. Here is also the same content as plain text for convenience: [[[ LZ4 compression over the wire in http:// and svn:// connections Deltas transferred between Subversion 1.10 clients and servers may be compressed with LZ4. The actual choice of the compression algorithm depends on the used protocol, environment and its configuration — see below. For http:// protocol, use of LZ4 compression depends on the values of the server-side SVNCompressionLevel directive, client-side http-compression configuration option and on the network capabilities. LZ4 compression generally offers much faster compression and decompression speeds, but slightly worse compression ratio than zlib. By default, it is only preferred over low latency networks where the overhead associated with transferring the additional amount of data is assumed to be negligible. - On the server-side, SVNCompressionLevel 0 can be used to disable compression altogether. The special value of SVNCompressionLevel 1 forces the use of LZ4 compression for clients that support it. All other values result in negotiating the use of zlib compression with the respective compression level, unless the compression is disabled on the client. - On the client-side, setting http-compression to either yes or no will disable or enable compression that is then negotiated based on the server's configuration. The default value of auto will result in preferring LZ4 compression over low latency networks and zlib compression otherwise. Below is the table explaining the used compression algorithm in each combination of the client- and server-side configuration options: [table] ]]] Thanks, Evgeny Kotkov Index: docs/release-notes/1.10.html === --- docs/release-notes/1.10.html(revision 1825639) +++ docs/release-notes/1.10.html(working copy) @@ -136,7 +136,7 @@ and what impact these changes may have. Use SVN 1.8 and above clients only for best results. - LZ4 compression over the wire in http:// connections + LZ4 compression over the wire in http:// and svn:// connections 1.10 1.10 @@ -526,7 +526,9 @@ containing large files. LZ4 compression is now used by default for the on-disk data in repositories -with filesystem format 8 (see below). +with filesystem format 8. Also, Subversion 1.10 adds support for automatic +negotiation and use of LZ4 compression over the wire for http:// and svn:// +connections when it is supported by both endpoints. Note: this does not apply to the pre-release https://lists.apache.org/thread.html/dd78432b301f3f95567930c242c46b308c8a017fd754c0f388245915@%3Cannounce.subversion.apache.org%3E";>Subversion 1.10.0-alpha3 @@ -559,7 +561,7 @@ format number of a repository.) Configuring the repository to use LZ4 compression - ¶ @@ -583,6 +585,116 @@ cycle into a new format 8 repository created with + +LZ4 compression over the wire in http:// and svn:// connections + ¶ + + +Deltas transferred between Subversion 1.10 clients and servers may be +compressed with LZ4. The actual choice of the compression algorithm depends +on the used protocol, environment and its configuration — see below. + +For http:// protocol, use of LZ4 compression depends on the values +of the server-side SVNCompressionLevel directive, client-side +http-compression configuration option and on the network +capabilities. LZ4 compression generally offers much faster compression +and decompression speeds, but slightly worse compression ratio than zlib. +By default, it is only preferred for low latency networks where the +overhead associated with transferring the additional amount of data is +assumed to be negligible. + + + +On the server-side, SVNCompressionLevel 0 +can be used to disable compression altogether. The special value of +SVNCompressionLevel 1 forces the use of LZ4 compression for +clients that support it. All other values result in negotiating the +use of zlib compression with the respective compression level, unless +the compression is disabled on the client. + + +On the client-side, setting http-compression to +either yes or no will disable or enable compression +that is then negotiated based on the server's conf
Re: Problems with commit feature negotiation and write-through proxies
Evgeny Kotkov writes: >> Index: subversion/mod_dav_svn/version.c >> === >> --- subversion/mod_dav_svn/version.c(revision 1820704) >> +++ subversion/mod_dav_svn/version.c(working copy) >> @@ -152,9 +152,6 @@ get_vsn_options(apr_pool_t *p, apr_text_header *ph >>apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_INHERITED_PROPS); >>apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_INLINE_PROPS); >>apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_REVERSE_FILE_REVS); >> - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_SVNDIFF1); >> - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_SVNDIFF2); >> - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_PUT_RESULT_CHECKSUM); >>apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_LIST); >>/* Mergeinfo is a special case: here we merely say that the server >> * knows how to handle mergeinfo -- whether the repository does too >> @@ -297,6 +294,19 @@ get_option(const dav_resource *resource, >> { "create-txn-with-props", { 1, 8, 0, "" } }, >>}; >> >> + /* These capabilities are used during commit and when acting as >> + a WebDAV slave (SVNMasterURI) their availablity depends on >> + the master version (SVNMasterVersion) rather than our own >> + (slave) version. */ >> + struct capability_versions_t { >> +const char *capability_name; >> +svn_version_t min_version; >> + } capabilities[] = { >> +{ SVN_DAV_NS_DAV_SVN_SVNDIFF1, { 1, 7, 0, ""} }, >> +{ SVN_DAV_NS_DAV_SVN_SVNDIFF2, { 1, 10, 0, ""} }, >> +{ SVN_DAV_NS_DAV_SVN_PUT_RESULT_CHECKSUM, { 1, 10, 0, ""} }, >> + }; > > I would be fine with this approach as well, but perhaps with a few tweaks: I also notice that this patch only advertises the new capabilities if the server is configured to enable HTTPv2. This is probably unintended, as I think that there is no reason to disable these PUT-related capabilities (such as being able to parse svndiff1/svndiff2) if the server is configured to prefer HTTPv1 protocol with SVNAdvertiseV2Protocol off. Thanks, Evgeny Kotkov
Re: Problems with commit feature negotiation and write-through proxies
Philip Martin writes: > We already have system for handling create-txn and create-txn-with-props > so I was thinking of extending that code: > > Index: subversion/mod_dav_svn/version.c > === > --- subversion/mod_dav_svn/version.c(revision 1820704) > +++ subversion/mod_dav_svn/version.c(working copy) > @@ -152,9 +152,6 @@ get_vsn_options(apr_pool_t *p, apr_text_header *ph >apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_INHERITED_PROPS); >apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_INLINE_PROPS); >apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_REVERSE_FILE_REVS); > - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_SVNDIFF1); > - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_SVNDIFF2); > - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_PUT_RESULT_CHECKSUM); >apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_LIST); >/* Mergeinfo is a special case: here we merely say that the server > * knows how to handle mergeinfo -- whether the repository does too > @@ -297,6 +294,19 @@ get_option(const dav_resource *resource, > { "create-txn-with-props", { 1, 8, 0, "" } }, >}; > > + /* These capabilities are used during commit and when acting as > + a WebDAV slave (SVNMasterURI) their availablity depends on > + the master version (SVNMasterVersion) rather than our own > + (slave) version. */ > + struct capability_versions_t { > +const char *capability_name; > +svn_version_t min_version; > + } capabilities[] = { > +{ SVN_DAV_NS_DAV_SVN_SVNDIFF1, { 1, 7, 0, ""} }, > +{ SVN_DAV_NS_DAV_SVN_SVNDIFF2, { 1, 10, 0, ""} }, > +{ SVN_DAV_NS_DAV_SVN_PUT_RESULT_CHECKSUM, { 1, 10, 0, ""} }, > + }; I would be fine with this approach as well, but perhaps with a few tweaks: - I think that it would be better to handle all four capabilities that relate to proxied writes in the same place: SVN_DAV_NS_DAV_SVN_EPHEMERAL_TXNPROPS SVN_DAV_NS_DAV_SVN_SVNDIFF1 SVN_DAV_NS_DAV_SVN_SVNDIFF2 SVN_DAV_NS_DAV_SVN_PUT_RESULT_CHECKSUM Otherwise, the handling becomes asymmetric, as the first of them is still checked using the separate dav_svn__check_ephemeral_txnprops_support() function, but the three new capabilities are checked using the new array- based approach. - The SVN_DAV_NS_DAV_SVN_SVNDIFF1 capability should probably be bound to version 1.10 instead of 1.7. The reason for that is that mod_dav_svn didn't properly advertise svndiff1 support with this specific header until 1.10, and I think that we should have the same behavior for the setups with a write- through proxy. (Even though mod_dav_svn has been able to parse it for a long time, currently ra_serf avoids sending svndiff1 to servers that don't advertise it with this specific header, as there might be third-party implementations of the Subversion's HTTP protocol that can only parse svndiff0). Thanks, Evgeny Kotkov
Re: Problems with commit feature negotiation and write-through proxies
Philip Martin writes: > In the past we supported different versions on the master and slave and > we introduced the SVNMasterVersion directive to configure it. > Unfortunately that information is not immediately available in > get_vsn_options() where the server advertises some commit features. If > I simply remove the SVN_DAV_NS_DAV_SVN_SVNDIFF2 header from > get_vsn_options() then I can commit. > > I'm not sure how we fix this. Do we delay the svndiff2 negotiation > until later in the commit? Do we abandon mixed master/slave versions? I think that we might be able to selectively stop advertising the new- in-1.10 capabilities based on the SVNMasterVersion value, similarly to how we handle SVN_DAV_NS_DAV_SVN_EPHEMERAL_TXNPROPS. Perhaps, we could do something along the lines of the attached patch? Thanks, Evgeny Kotkov Index: subversion/mod_dav_svn/dav_svn.h === --- subversion/mod_dav_svn/dav_svn.h(revision 1820731) +++ subversion/mod_dav_svn/dav_svn.h(working copy) @@ -359,12 +359,24 @@ svn_boolean_t dav_svn__get_list_parentpath_flag(re master server version (if provided via SVNMasterVersion). */ svn_boolean_t dav_svn__check_httpv2_support(request_rec *r); -/* For the repository referred to by this request, should ephemeral - txnprop support be advertised? */ -svn_boolean_t dav_svn__check_ephemeral_txnprops_support(request_rec *r); +typedef enum dav_svn__commit_capability_t +{ + dav_svn__capability_ephemeral_txnprops, + dav_svn__capability_svndiff1, + dav_svn__capability_svndiff2, + dav_svn__capability_put_result_checksum +} dav_svn__commit_capability_t; +/* For the repository referred to by this request, is the specified + commit-related capability supported? Note that this also takes into + account the support level expected of based on the specified + master server version (if provided via SVNMasterVersion). */ +svn_boolean_t +dav_svn__check_commit_capability(request_rec *r, + dav_svn__commit_capability_t capability); + /* SPECIAL URI SVN needs to create many types of "pseudo resources" -- resources Index: subversion/mod_dav_svn/mod_dav_svn.c === --- subversion/mod_dav_svn/mod_dav_svn.c(revision 1820731) +++ subversion/mod_dav_svn/mod_dav_svn.c(working copy) @@ -924,17 +924,30 @@ dav_svn__check_httpv2_support(request_rec *r) svn_boolean_t -dav_svn__check_ephemeral_txnprops_support(request_rec *r) +dav_svn__check_commit_capability(request_rec *r, + dav_svn__commit_capability_t capability) { svn_version_t *version = dav_svn__get_master_version(r); - /* We know this server supports ephemeral txnprops. But if we're - proxying requests to a master server, we need to see if it - supports them, too. */ - if (version && (! svn_version__at_least(version, 1, 8, 0))) -return FALSE; + /* Unless we are proxying requests to a master server with a declared + version number, there is no need to dumb ourselves down. */ + if (!version) +return TRUE; - return TRUE; + /* We _are_ proxying requests to another master server with a declared + version number. See if we need to selectively disable some of the + capabilities that it doesn't support. */ + switch (capability) +{ + case dav_svn__capability_ephemeral_txnprops: +return svn_version__at_least(version, 1, 8, 0); + case dav_svn__capability_svndiff1: + case dav_svn__capability_svndiff2: + case dav_svn__capability_put_result_checksum: +return svn_version__at_least(version, 1, 10, 0); + default: +SVN_ERR_MALFUNCTION_NO_RETURN(); +} } Index: subversion/mod_dav_svn/version.c === --- subversion/mod_dav_svn/version.c(revision 1820731) +++ subversion/mod_dav_svn/version.c(working copy) @@ -152,9 +152,6 @@ get_vsn_options(apr_pool_t *p, apr_text_header *ph apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_INHERITED_PROPS); apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_INLINE_PROPS); apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_REVERSE_FILE_REVS); - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_SVNDIFF1); - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_SVNDIFF2); - apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_PUT_RESULT_CHECKSUM); apr_text_append(p, phdr, SVN_DAV_NS_DAV_SVN_LIST); /* Mergeinfo is a special case: here we merely say that the server * knows how to handle mergeinfo -- whether the repository does too @@ -210,12 +207,29 @@ get_option(const dav_resource *resource, ""); /* If we're allowed (by configuration) to do so, advertise support - for ephemeral transaction properties. */ - if (dav_svn__check_ephemeral_t
Re: svn commit: r1818496 - /subversion/site/staging/docs/release-notes/1.10.html
Daniel Shahaf writes: > This language implies that we made a backwards incompatible change in 1.6, > which is not precisely the case; 1.6 simply started enforcing an API > requirement that 1.5 did not. > > Could we perhaps rephrase this accordingly? I.e., point out that we weren't > inventing a backwards-incompatible requirement, but starting to enforce one > that had always been there? I would think that from the standpoint of a user reading the release notes, both versions look more or less the same (maybe, with the second one being slightly harder to parse due to its verbosity): Such property line endings were accepted by older servers and can be found in repository dumps, but are considered invalid starting with Subversion 1.6. Such invalid line endings were accepted by older servers and can be found in repository dumps of older repositories, but are rejected by Subversion 1.6 and later. But, presumably, both variants are fine, as both of them explain the reason behind the "Cannot accept non-LF line endings" error — so please feel free to commit this tweak if you think it would be more appropriate. Thanks, Evgeny Kotkov
Re: svn commit: r1807836 - in /subversion/trunk: subversion/include/ subversion/include/private/ subversion/libsvn_repos/ subversion/svnadmin/ subversion/svnrdump/ subversion/tests/cmdline/ subversion
Johan Corveleyn writes: > Hi Evgeny, > > I just mentioned this upcoming --normalize-props feature in a users > thread (and included a link to your commit). I noticed that it's not > mentioned in the release notes of 1.10 yet. Do you have time to write > a small release notes entry for it? Thanks for the reminder, committed in r1818496: https://svn.apache.org/r1818496 https://subversion-staging.apache.org/docs/release-notes/1.10.html#normalize-props Regards, Evgeny Kotkov
Re: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db
Bert Huijben writes: >> The recent SQLite versions (starting from 3.8.2, released in December 2013) >> feature a `WITHOUT ROWID` optimization [1] that can be enabled when >> creating a table. In short, it works well for tables that have non-integer >> primary keys, such as >> >> name TEXT PRIMARY KEY >> >> by not maintaining the hidden rowid values and an another B-Tree to match >> between a primary key value and its rowid. This reduces the on-disk size >> and makes the lookups faster (a key → rowid → data lookup is replaced with >> a key → data lookup). > > It doesn't add another B-tree for the primary key and its rowids. For the > primary key the main table is used as the index. > > The case where things differ is when there are multiple indexes. In this > case normal table will always refer to the primary key using the rowed, > while for 'WITHOUT ROWID' there will be referred to the primary key, > which in general is larger. For the sake of finding out the truth :), I think that this contradicts the explanation in https://sqlite.org/withoutrowid.html : CREATE TABLE IF NOT EXISTS wordcount( word TEXT PRIMARY KEY, cnt INTEGER ); As an ordinary SQLite table, "wordcount" is implemented as two separate B-Trees. The main table uses the hidden rowid value as the key and stores the "word" and "cnt" columns as data. The "TEXT PRIMARY KEY" phrase of the CREATE TABLE statement causes the creation of an unique index on the "word" column. This index is a separate B-Tree that uses "word" and the "rowid" as the key and stores no data at all. Note that the complete text of every "word" is stored twice: once in the main table and again in the index. Although I didn't check if the most recent SQLite version still behaves in the described way internally, I have witnessed the described close-to-2x reduction in the size of rep-cache.db — which, unless I am missing something, follows the described idea of this optimization. Thanks, Evgeny Kotkov
Re: Subversion 1.10 RC1?
Stefan Fuhrmann writes: > There seems to be little that could be done here (suggestions welcome). > The problem is that the asterisk is being expanded by the shell itself. > I made SVN print its command line parameters and this is the result: > > $ ./subversion/svn/svn ls svn://localhost/kde --search M* > 0: ./subversion/svn/svn > 1: ls > 2: svn://localhost/kde > 3: --search > 4: Makefile > 5: Makefile.in > > That can be prevented by adding quotation marks: > > $ ./subversion/svn/svn ls svn://localhost/kde --search "M*" > 0: ./subversion/svn/svn > 1: ls > 2: svn://localhost/kde > 3: --search > 4: M* Unfortunately, on Windows both `--search M*` and (quoted) `--search "M*"` would expand the asterisk. While this is not the default shell behavior, currently it's enabled for svn and a couple of other binaries by linking to setargv.obj. In turn, this probably means the command-line client users on Windows could get quite unexpected results when using the `--search ARG` syntax. A potential cheap solution for this issue, I think, would be to make the --search argument work as a substring to search for in filenames, instead of using it as a pattern that the (whole) filename should match. While there are some cases where the latter approach gives more flexibility, my guess would be that a substring search would work well in the majority of scenarios. (Also, as far as I recall, `log --search` currently searches for a substring, so that would be consistent with it, and would probably avoid surprising the users by having a switch with the same name, but behaving differently.) Thanks, Evgeny Kotkov
Potential regression: high server-side memory consumption during import (was: Subversion 1.10 RC1?)
Julian Foad writes: > After any issues raised in this discussion are resolved, we feel we should > go ahead and produce RC1 as soon as possible. I think that I am seeing a 1.10 regression in terms of httpd's memory usage during large imports. In my environment, when I `svn import` an unpacked version of Boost (https://www.boost.org/) that consists of around 60,000 files, I see that the memory consumption by the server process rapidly grows up to 1.5 GB. The environment is a Windows 8.1 x64 machine with httpd 2.4.29 configured to use short-circuit authz and HTTPS. Apparently, this behavior is specific to 1.10, as I cannot reproduce it with 1.9 binaries. (I haven't investigated the issue any further at this time, and it might as well be reproducible in other configurations.) Thanks, Evgeny Kotkov
Re: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db
Evgeny Kotkov writes: > I think that it would be nice to have this optimization in rep-cache.db, > and that we can start using it in a compatible way: > > - All existing rep-cache.db statements are compatible with it. > > - Since SQLite versions prior to 3.8.2 don't support it, we would > only create the new tables with this optimization in fsfs format 8, > and simultaneously bump the minimal required SQLite version from > 3.7.12 (May 2012) to 3.8.2 (December 2013). This would ensure that > all binaries supporting format 8 can work with the tables with this > optimization. > > Would there be any objections to a change like this (see the attached patch)? I committed the patch (with a couple of additional tweaks for the recommended SQLite version and the build scripts) in https://svn.apache.org/r1817042 Thanks, Evgeny Kotkov
[PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db
Hi all, The recent SQLite versions (starting from 3.8.2, released in December 2013) feature a `WITHOUT ROWID` optimization [1] that can be enabled when creating a table. In short, it works well for tables that have non-integer primary keys, such as name TEXT PRIMARY KEY by not maintaining the hidden rowid values and an another B-Tree to match between a primary key value and its rowid. This reduces the on-disk size and makes the lookups faster (a key → rowid → data lookup is replaced with a key → data lookup). Currently, the rep-cache.db schema uses a non-integer primary key: hash TEXT NOT NULL PRIMARY KEY and can benefit from this optimization. A quick experiment showed a reduction of the on-disk size of the database by ~1.75x. The lookups should also be faster, both due to the reduced database size and due to the lesser amount of internal bsearches. This should improve the times of new commits and `svnadmin load`, especially for large repositories that also have large rep-cache.db files. I think that it would be nice to have this optimization in rep-cache.db, and that we can start using it in a compatible way: - All existing rep-cache.db statements are compatible with it. - Since SQLite versions prior to 3.8.2 don't support it, we would only create the new tables with this optimization in fsfs format 8, and simultaneously bump the minimal required SQLite version from 3.7.12 (May 2012) to 3.8.2 (December 2013). This would ensure that all binaries supporting format 8 can work with the tables with this optimization. Would there be any objections to a change like this (see the attached patch)? [1] https://sqlite.org/withoutrowid.html Thanks, Evgeny Kotkov fsfs: Use the `WITHOUT ROWID` optimization for rep-cache.db in format 8. This optimization, introduced in SQLite 3.8.2, works well for tables that have non-integer primary keys, such as hash TEXT NOT NULL PRIMARY KEY in the rep-cache.db. (See the https://sqlite.org/withoutrowid.html article for additional details.) A quick experiment showed a reduction of the on-disk size of the database by ~1.75x. The lookups should also be faster, both due to the reduced database size and due to the lesser amount of internal bsearches. This should improve the times of new commits and `svnadmin load`, especially for large repositories that also have large rep-cache.db files. In order to maintain compatibility, since SQLite versions prior to 3.8.2 do not support this statement, we only start using it for fsfs format 8 repositories and simultaneously bump the minimal required SQLite version from 3.7.12 (May 2012) to 3.8.2 (December 2013). The last step ensures that all binaries compiled to support format 8 can work with the tables with this optimization. * subversion/libsvn_fs_fs/rep-cache-db.sql (STMT_CREATE_SCHEMA): Rename this ... (STMT_CREATE_SCHEMA_V1): ...to this. (STMT_CREATE_SCHEMA_V2): New, enables `WITHOUT ROWID` optimization. (STMT_GET_REP, STMT_SET_REP, STMT_GET_REPS_FOR_RANGE, STMT_GET_MAX_REV, STMT_DEL_REPS_YOUNGER_THAN_REV, STMT_LOCK_REP, STMT_UNLOCK_REP): Note that these statements work for both V1 and V2 schemas. * subversion/libsvn_fs_fs/fs.h (SVN_FS_FS__MIN_REP_CACHE_SCHEMA_V2_FORMAT): New. * subversion/libsvn_fs_fs/rep-cache.c (REP_CACHE_SCHEMA_FORMAT): Remove. (open_rep_cache): Select between creating a V1 or V2 schemas based on the format of the filesystem. * subversion/libsvn_subr/sqlite.c (): Bump minimum required SQLite version to 3.8.2. * subversion/tests/cmdline/svnadmin_tests.py (check_hotcopy_fsfs_fsx): Check if the Python's built-in SQLite version is enough to interpret the schema of rep-cache.db, and skip the check if it's not. Index: subversion/libsvn_fs_fs/fs.h === --- subversion/libsvn_fs_fs/fs.h(revision 1816730) +++ subversion/libsvn_fs_fs/fs.h(working copy) @@ -196,6 +196,10 @@ extern "C" { */ #define SVN_FS_FS__MIN_REP_STRING_OPTIONAL_VALUES_FORMAT 8 + /* The minimum format number that supports V2 schema of the rep-cache.db +database. */ +#define SVN_FS_FS__MIN_REP_CACHE_SCHEMA_V2_FORMAT 8 + /* On most operating systems apr implements file locks per process, not per file. On Windows apr implements the locking as per file handle locks, so we don't have to add our own mutex for just in-process Index: subversion/libsvn_fs_fs/rep-cache-db.sql === --- subversion/libsvn_fs_fs/rep-cache-db.sql(revision 1816730) +++ subversion/libsvn_fs_fs/rep-cache-db.sql(working copy) @@ -21,7 +21,7 @@ * */ --- STMT_CREATE_SCHEMA +-- STMT_CREATE_SCHEMA_V1 /* A table mapping representation hashes to locations in a rev file. */ CREATE TABLE rep_cache ( hash TEXT NOT NULL PRIMARY K
Re: Let's switch to time-based releases
Branko Čibej writes: >> I know this has been discussed before, but it was never made reality. >> On last week's hackathon we also didn't come to a full consensus on >> various details. But I think we should simply work out those details, >> hopefully come to some compromise, and if so, JFD it. > > It has been discussed before and for a while we even did it this way. > The problem is that we're again at the point where we have to find a > reluctant volunteer to RM any release we want to make. So ... > > "There's no 'we' in Team" :p Some individual has to step up and do > this. The sort of release train you describe needs someone to drive it, > to whit, a "permanent" release manager. No amount of documentation and > schedules and good intentions will make this happen all by itself. I concur that the time-based releases most likely won't happen just based on the agreement, and that there has to be a person driving them. Perhaps, we could try this approach by finding a volunteer responsible for RM-ing the 1.11 release — for example, right after 1.10 is released. Regards, Evgeny Kotkov
Supporting precooked fsfs v7 and v1-v3 in the test suite (was: Re: svn commit: r1813898 [...])
[Changing subject, as apparently the test failures are not connected to the discussed fix for the rep sharing and issues #4623, #4700] Stefan Fuhrmann writes: > Test expectations may need to be adapted. > With v7 repos (see r1816402), I get two > test failures while v4 and v6 pass: [...] > FAIL: svnadmin_tests.py 12: svnadmin verify detects corruption dump can't [...] > FAIL: svnadmin_tests.py 24: svnadmin verify with non-UTF-8 paths I committed two follow-ups to the addition of precooked v7 repositories in r1816402: https://svn.apache.org/r1816424 https://svn.apache.org/r1816425 The first of these follow-ups limits the mentioned svnadmin tests to only run with precooked repositories of formats 4 and 6 — as these tests perform a set of hardcoded manipulations, assuming that the repository does not use logical addressing. Speaking of the added precooked repositories of formats 1-3, I am not too sure if they should be created with the 1.9 binaries, as stated in the log message for r1816402. I think that the actual repositories created with older SVN versions may be different from the ones created with the `--compatible=version=` option, and that using the latter approach may just create a false sense of security. Perhaps, an alternative way would be to find the 1.1 binaries, prepare the precooked format 1 repositories using it and limit the set of precooked repositories to v1, v4, v6 and v7. (Also, I think that the main.py:parse_options() function may need an update to properly handle the added v1-v3 precooked repositories by downgrading the server_minor_version — as otherwise, the various test suite checks such as server_has_mergeinfo() won't kick in.) Thanks, Evgeny Kotkov
Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py
Stefan Fuhrmann writes: >> An alternative approach that might be worth considering here would be: >> >> (1) Extend the on-disk format and allow representation strings without >> SHA1, but with the uniquifier, something like this (where "-" stands >> for "no SHA1"): >> >> 15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5 >> >> (The new format would be allowed starting from FSFS 8.) > > Yes, that would be one option. If you are willing to provide a patch, > I would probably +1 it. Not bothering with the space savings would > be a valid choice as well, IMO. >> >> (2) Use the new format to allow rep sharing for properties that writes >> the uniquifier so that svn_fs_props_changed() would work correctly, >> and doesn't introduce the overhead of writing SHA1 in the >> representation string for every property. [...] >> Barring objections and alternative suggestions, I could give a shot at >> implementing this. > > Go for it. Maybe notify me once you are done b/c currently, I don't > monitor the commit activity closely. I committed the implementations of (1) and (2) in https://svn.apache.org/r1816347 and https://svn.apache.org/r1816348 respectively. Thanks, Evgeny Kotkov
Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py
Stefan Sperling writes: > However, if rep-sharing is enabled, svn_fs_props_changed() does not work > as advertised because properties do not carry a SHA1 checksum with a > "uniquifier" which identifies the transaction they were created in. > The uniquifier is used by svn_fs_props_changed() to tell apart property > representations which share content but were created in different revisions. > > To fix that problem, make FSFS write SHA1 checksums along with uniquifiers > for file properties, just as it is already done for file content. > > A source code comment indicates that SHA1 checksums for property reps > were not written due to concerns over disk space. In hindsight, this was > a bad trade-off because it affected correctness of svn_fs_props_changed(). Thinking about this change, it could be that writing the additional 40-byte SHA1 for the property representations is going to eliminate the benefit of sharing them in the first place. If I recall correctly, rep sharing for properties is mostly there to gain from deduplicating small properties, such as svn:eol-style or svn:keywords. But with the additional SHA1 written in every representation string, this overhead is likely to take more space than the property itself. An alternative approach that might be worth considering here would be: (1) Extend the on-disk format and allow representation strings without SHA1, but with the uniquifier, something like this (where "-" stands for "no SHA1"): 15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5 (The new format would be allowed starting from FSFS 8.) (2) Use the new format to allow rep sharing for properties that writes the uniquifier so that svn_fs_props_changed() would work correctly, and doesn't introduce the overhead of writing SHA1 in the representation string for every property. (3) Disable rep sharing for properties in FSFS formats < 8 that cannot read and write such representation strings without SHA1, but with an uniquifier. Barring objections and alternative suggestions, I could give a shot at implementing this. Regards, Evgeny Kotkov
Re: svn commit: r1816069 - /subversion/branches/1.8.x-issue4707/subversion/svnrdump/dump_editor.c
Julian Foad writes: > -": %lu\n", > -(unsigned long)info->size)); > +": %" APR_SIZE_T_FMT "\n", > +info->size)); I think that using APR_SIZE_T_FMT would still lead to the same issue with large file sizes in the 32-bit environment (where size_t is also 32 bit). Perhaps, the code should be using APR_OFF_T_FMT as the format specifier? (The apr_file_info_t.size value is an apr_off_t) Regards, Evgeny Kotkov
Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/
Evgeny Kotkov writes: > Daniel Shahaf writes: > >>> + * >>> + * Any temporary allocations may be performed in @a scratch_pool. >> >> Need to add an @since tag here. > > [...] > >>> + */ >>> + svn_error_t *(*apply_textdelta_stream)( >> >> Could you update the docstring of svn_delta_editor_t itself to mention this >> new callback? All other callbacks are discussed there. > > [...] > >>> +const svn_delta_editor_t *editor, >> >> This parameter is undocumented. > > Thank you for the review. I agree with these three points, and will try > to make the suggested tweaks to the documentation. Committed in r1816063. Daniel Shahaf writes: > +1, but note that the "shouldn't" language in current HEAD, which this > patch [once rebased to HEAD] will remove, was copied from some other > docstring. I made no note of which one specifically, because I think > that "shouldn't" language is our standard formula for docstrings of > non-opaque struct types. Committed in r1816066. Regards, Evgeny Kotkov
Re: svn commit: r1815799 - in /subversion/trunk/subversion: libsvn_ra_serf/commit.c tests/libsvn_ra/ra-test.c
Evgeny Kotkov writes: > (2) I am going to tweak the new test so that it would properly open the > parent directory and commit to a locked file, to have this case > covered with a native test. Committed in r1816060. Julian Foad writes: >> (1) Keep the new simpler check in maybe_set_lock_token_header() — as, >> unless I missing something, there should be no reason to explicitly >> filter empty relpaths for the lock tokens since they are invalid. > > I agree it is good to remove the '*relpath &&' condition if there is no > reason why it needs to be there. > > Don't replace it with 'relpath &&' instead, however. If relpath is null then > I think the next line (svn_hash_gets(..., relpath)) would immediately crash > anyway, so allowing it here is useless and therefore confusing. Remove that > condition entirely. That's my suggestion. The condition is inverted in the sense that it checks for a null relpath and returns if so — in other words, the svn_hash_gets() won't get called if the relpath is null. However, as it turns out, this condition can indeed be simplified by not checking for null, as the only calling site where the relpath might be null, setup_proppatch_headers(), already checks for it. (Among the other two calling sites, the relpath cannot be null as it would have segfaulted earlier). I committed this simplification in r1816061, thanks! Regards, Evgeny Kotkov
Re: Subversion 1.10 RC1?
Julian Foad writes: > At the hackathon today we (me, Stefan Hett, Bert, Johan) have been talking > about how to progress 1.10. > > We think all the features and changes are safe to release and are not going > to get more testing until we produce a "release candidate". (For example, at > that point Stefan will be able to justify taking time at his work to test > the client in production scenarios.) I did a couple of quick tests for these new features; some of my findings that we might want to address before 1.10 GA and additional comments are below: > * conflict resolution: we understand it is already much better than 1.9; > low risk if parts of it don't work quite right; designed so that > improvements can be made in patch releases. I noticed that if the resolution of a tree conflict leads to a text conflict, the temporary files (base, mine, theirs) are not cleaned up upon revert and remain as unversioned files in the working copy. C +NewFile.txt > moved from OldFile.txt ? NewFile.txt.2.tmp ? NewFile.txt.3.tmp ? NewFile.txt.tmp > * shelving v1: is isolated -- doesn't affect anything else; is limited but > already useful; will be changed in the next release so APIs are marked > "SVN_EXPERIMENTAL"; changes shelved by this release could be detected and > 'upgraded' by a future release; should the CLI commands be marked > "experimental" in the help, too (Johan thinks yes)? One comment that I have is that marking the new commands experimental might not prevent the users from using them on a regular basis (e.g., for those who don't read help or use a GUI client like TSVN that might not have the "experimental" labels). Which, in turn, could mean that if the future format changes, we would have to convert the data stored in the patch files to avoid a data loss for such users. Also, out of curiosity, are there any current plans to support binary changes for the shelves? As far as I recall, there has been a mention of using diff --git for these purposes, and I also saw a recent commit where you added a test for diff --git, which might be related to this topic :) The other two features that I remember, are: * improved authz with support for wildcards * server-side search with `svn ls --search` Speaking of the `ls --search`, I think that there is an issue with the command-line parsing. Based on what I see, the --search argument may be interpreted as the target for listing, but not as the search pattern: svn ls --search * svn: warning: apr_err=SVN_ERR_WC_PATH_NOT_FOUND svn: warning: W155010: The node 'C:\Project\unversioned' was not found. ..\..\..\subversion\svn\list-cmd.c:453: (apr_err=SVN_ERR_ILLEGAL_TARGET) svn: E29: Could not list all targets because some targets don't exist svn ls http://spbvo-ws09.ostyserver.net:8080/svn/master2 --search * svn: E155007: 'C:\AnotherProject' is not a working copy Thanks, Evgeny Kotkov
Re: svn commit: r1815799 - in /subversion/trunk/subversion: libsvn_ra_serf/commit.c tests/libsvn_ra/ra-test.c
Bert Huijben writes: > I'm not sure if we should really allow this. > > The delta editor explicitly describes that you are opening a directory and > then edit the nodes inside. Only changing properties on the root is allowed > and other operations are all on nodes within. Allowing to open the node > itself again may cause all kinds of problems as there are now multiple > handles pointing to the same thing. How will this be expressed in the > filesystem/transaction? > > I'm surprised that all the other filesystems allow this, so perhaps this > is a safe change... but the documentation in svn_delta.h doesn't describe > this as a safe extension. (Which would theoretically allow this as a safe > extension in later versions... but we must make sure that we are not > opening new issues this way) > > Currently I would guess that making the ra layers provide a proper error > for this case would not be a bad thing... All our drivers explicitly open > an existing directory when they want to edit a file... I think that I have missed that open_root() must open a directory and so it cannot be used when working with a file URL. This, in turn, means that the test example is indeed an undocumented/invalid usage of the delta editor API and that it works by a coincidence. I can propose doing the following: (1) Keep the new simpler check in maybe_set_lock_token_header() — as, unless I missing something, there should be no reason to explicitly filter empty relpaths for the lock tokens since they are invalid. (2) I am going to tweak the new test so that it would properly open the parent directory and commit to a locked file, to have this case covered with a native test. Thanks, Evgeny Kotkov
Upcoming Subversion hackathon 2017
Dear Members of the Subversion community, The Apache Subversion project organizes a yearly 1-week-long hackathon for its project members where we gather around at one location and push forward the development of the project. This gives us a time frame where we can fully focus and concentrate on the development rather than only invest our split spare time. This event also serves as a chance to start and progress more complex ideas/features which require more coordination between multiple developers. The event is planned to be held in Aachen, Germany in the late November. The approximate dates for the main part of this event are either November 13-17th or 20-24th. One of the Subversion PMC members, Stefan Hett, who currently resides in Aachen, is going to take on most of the organizational work for this hackathon. We will post additional information on the event in the nearby future. If you would be interested in participating in it, please don't hesitate to contact us at priv...@subversion.apache.org Sponsorship: This is a great opportunity for companies involved in the Apache Subversion project, or who benefit from its code - your employers - to get their name in front of the community. If you are interested in becoming a sponsor of this event and supporting the further development of the project by such means, please contact us at priv...@subversion.apache.org for details. Thanks, Evgeny Kotkov (VP Apache Subversion)
Re: FAQ entries #dumpload and (new) #fix-nonLF-log
Johan Corveleyn writes: > Also: this reminds me of those other whiteboard items from 2015: > "svnadmin load --normalize-revprops", "- normalize svn:* props", "- > normalize mergeinfo", to get 'svnadmin load' on par with svnsync > regarding normalization. Having those options (at least the first two) > would make those faq answers a lot shorter ... If anyone has some > cycles to spare, 1.10 might be a good target to try and get these in > ... I committed the first implementation of the --normalize-props option for svnadmin load in https://svn.apache.org/r1807836 This option currently translates non-LF line endings found in the svn: property values. I think that this should improve the situation for users who attempt to load dump files with such property values. Regards, Evgeny Kotkov
Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/
Daniel Shahaf writes: >> Also, I am not against the idea of tweaking the note by saying something >> like "...because we may add new fields in the future", but I don't think >> that it is required either. (In other words, I am +0 to that.) > > Done in r1807028. I think that although r1807028 provides the additional information to the users, it simultaneously makes the API requirements less strict: > - * @note Don't try to allocate one of these yourself. Instead, always > - * use svn_delta_default_editor() or some other constructor, to ensure > - * that unused slots are filled in with no-op functions. > + * @note Fields may be added to the end of this structure in future > + * versions. Therefore, users shouldn't allocate structures of this > + * type, to preserve binary compatibility. > + * > + * @note It is recommended to use svn_delta_default_editor() or some other > + * constructor, to ensure that unused slots are filled in with no-op > functions. While it does clarify that new fields can be added to the struct, this changeset also replaces words like "don't try" and "always" with "shouldn't" and "is recommended" thus making the requirements a recommendation: - "Don't try to allocate one of these yourself" became "users shouldn't allocate structures" - "Instead, always use svn_delta_default_editor()" is now "It is recommended to use svn_delta_default_editor()" Perhaps, a better way would be to keep the original wording, but mention that the structure may be extended, as in this alternative patch: [[[ --- subversion/include/svn_delta.h (revision 1806549) +++ subversion/include/svn_delta.h (working copy) @@ -694,8 +694,10 @@ svn_txdelta_skip_svndiff_window(apr_file_t *file, * as it produces the delta. * * @note Don't try to allocate one of these yourself. Instead, always - * use svn_delta_default_editor() or some other constructor, to ensure - * that unused slots are filled in with no-op functions. + * use svn_delta_default_editor() or some other constructor, to avoid + * backwards compatibility problems if the structure is extended in + * future releases and to ensure that unused slots are filled in with + * no-op functions. * * Function Usage * ]]] Regards, Evgeny Kotkov
Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/
Daniel Shahaf writes: > I'm not sure I'm getting through here. > > The note does say "Don't allocate one of these yourself" but in the > second sentence implies that the reason for this is that if you allocate > it yourself and don't initialize all pointer-to-function members, > Trouble will ensue. Therefore, the note could be construed as meaning > that it is okay to allocate one of these yourself if you take care to > initialize all pointer members. > > In other words: the comment could have been interpreted as a piece > of advice --- "it is more robust to use _default_editor() than to allocate > with malloc" --- as opposed to a blanket prohibition on allocating this > struct type with malloc. > > (If one uses malloc and doesn't initialize all pointers, the result > would be that an uninitialized pointer-to-function struct member is > dereferenced.) > > In contrast, most of our other structs explicitly say that "Don't > allocate yourself because we may add new fields in the future". This > struct does not give that reason. > > Makes sense? > > I suppose can just add an API erratum and/or release notes entry about this. I think that the important thing about this documentation is that it states that: (1) The API user shouldn't try to allocate the struct himself. (2) A constructor such as svn_delta_default_editor() should *always* be used. To my mind, the current statement is clear and it is not possible to interpret it as if it's allowed to malloc() the struct and initialize it manually. My opinion here is that neither the API erratum nor including this in the release notes are required, and doing so would just unnecessarily restate the information that's already available. Also, I am not against the idea of tweaking the note by saying something like "...because we may add new fields in the future", but I don't think that it is required either. (In other words, I am +0 to that.) Regards, Evgeny Kotkov
Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/
Daniel Shahaf writes: >> + * >> + * Any temporary allocations may be performed in @a scratch_pool. > > Need to add an @since tag here. [...] >> + */ >> + svn_error_t *(*apply_textdelta_stream)( > > Could you update the docstring of svn_delta_editor_t itself to mention this > new callback? All other callbacks are discussed there. [...] >> +const svn_delta_editor_t *editor, > > This parameter is undocumented. Thank you for the review. I agree with these three points, and will try to make the suggested tweaks to the documentation. >> + /** Apply a text delta stream, yielding the new revision of a file. >> + * >> + * @a file_baton indicates the file we're creating or updating, and the >> + * ancestor file on which it is based; it is the baton set by some >> + * prior @c add_file or @c open_file callback. >> + * >> + * @a open_func is a function that opens a #svn_txdelta_stream_t object. >> + * @a open_baton is provided by the caller. >> + * >> + * @a base_checksum is the hex MD5 digest for the base text against >> + * which the delta is being applied; it is ignored if NULL, and may >> + * be ignored even if not NULL. If it is not ignored, it must match > > What's the rationale for allowing drivees to ignore the checksum? > > This leeway enables failure modes that wouldn't be possible without it. > (Drivers that are aware of this leeway will validate checksums even if the > drivee doesn't, leading to duplicate work; drivers that are unaware of this > requirement might not get checksum errors they should have.) > > I get that you just copied this part of the docstring from apply_textdelta(), > but I'd like to understand what's the rationale here. (And to see if this > leeway should be deprecated) My interpretation of the documentation is that "the result checksum must be validated by the implementation, whereas validating the checksum of the base text may be omitted". Perhaps, there are cases where the base checksum won't be validated by our existing implementations (what about BDB, for instance?), but in the meanwhile, I'm not too sure about the gain from _always_ requiring such checks. I think that, from the editor driver's point of view, the important thing is the result checksum. If the implementation also validates the base checksum, that may allow it to skip actually applying the delta in case of the early mismatch, give away better error messages ("I have a base checksum mismatch" instead of "I can't apply instruction 7 at offset 12345") and maybe even detect the coding mistakes which cause the delta to be applied to an unexpected base, but still yielding the expected resulting fulltext. Having all these properties is nice, but probably not mandatory. And I think that lifting the optionality of this check could shoot us in the foot in the future, if we find ourselves writing an implementation where it is particularly tricky to always implement the base text checks — for instance, due to the used protocol or any other technical reasons. Hope that I am not missing something subtle here :) >> + * the checksum of the base text against which svndiff data is being >> + * applied; if it does not, @c apply_textdelta_stream call which detects >> + * the mismatch will return the error SVN_ERR_CHECKSUM_MISMATCH >> + * (if there is no base text, there may still be an error if >> + * @a base_checksum is neither NULL nor the hex MD5 checksum of the >> + * empty string). > > To the best of my knowledge, we don't special case the empty string's > checksum, d41d8cd98f00b204e9800998ecf8427e, anywhere. We do special-case > , though. I assume the parenthetical should > be fixed accordingly (both here and in apply_textdelta())? I agree that this part of the docstring (same in svn_fs_apply_textdelta()) looks odd, and I don't think that the digest of an empty string is specially handled somewhere in the implementations. Perhaps, it would make sense to check on whether this has been true at some point in the past and also tweak the docstring. > Is adding this function an ABI-compatible change? The docstring of > svn_delta_editor_t does say """ > > * @note Don't try to allocate one of these yourself. Instead, always > * use svn_delta_default_editor() or some other constructor, to ensure > * that unused slots are filled in with no-op functions. > > """, but an API consumer might have interpreted this note as meaning "You may > use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize > all struct members", in which case, his code will not be ABI compatible > with 1.10. I think that adding this callback does not affect the ABI compatibility. The note says "Don't try to allocate one of these yourself", whereas the malloc(..., sizeof(svn_delta_editor_t)) example does exactly the opposite. Thanks, Evgeny Kotkov
Re: svn commit: r1806017 - /subversion/trunk/subversion/libsvn_ra_serf/merge.c
Bert Huijben writes: >> ra_serf: Prevent the server from generating and sending the full MERGE >> response in cases when we don't require it. >> >> The full response is not required when working over HTTPv2 protocol. >> When working over HTTPv1, it's only required when the new working copy >> properties need to be stored as part of a commit (indicated by a non-null >> svn_ra_push_wc_prop_func_t callback). > > Nice catch! > > Does this affect performance enough that we should backport this fix? Thanks! I guess that it would be nice to backport this fix, as it prevents the server from reading the list of the committed changes after the commit and also reduces the size of the sent response. I think that the full MERGE response can be quite large for commits with thousands of changed paths, although I don't have any real numbers at this time. With that in mind, I have put nominating this change on my todo list, unless someone else beats me to it. Regards, Evgeny Kotkov
Re: svn commit: r1806548 - in /subversion/trunk/subversion: svn/svn.c svnbench/svnbench.c tests/cmdline/basic_tests.py
Bert Huijben writes: >> As Julian discovered, '--search' as used with 'svn log' is may not suitable >> for 'svn ls'. File name matching should be case-sensitive and requires >> full patterns just like e.g. the ordinary unix command line 'ls' command. >> >> Therefore, introduce a separate '--pattern' option for 'svn log' that works >> similar to patterns with Unix command line 'ls'. Since the actual matching >> already confirms to that, we only need a different option pre-processing. > > Perhaps we could use --glob, to allow other syntax patterns later? > > Not sure... perhaps --glob is too technical. My 2 cents on this would be that having "svn ls --pattern" and "svn log --search" which only differ in terms of case sensitivity have a chance of being confusing for the users. Perhaps, a slightly better way would be to keep the case-insensitive behavior by default, but add a "--case-sensitive" switch to handle the cases where it's required? That is, svn ls --search svn ls --search --case-sensitive Regards, Evgeny Kotkov
Re: [PATCH] Reduce the amount of WriteFile() syscalls when writing to buffered files
Evgeny Kotkov writes: > In the meanwhile, apparently, there is an oversight in the core V1 patch > (3-reduce-syscalls-for-buffered-writes-v1.patch.txt): > > If the buffer is not empty, and the caller issues a write with a chunk > that slightly exceeds the buffer size, for example, 4100 bytes, it would > result in flushing the buffer _and_ writing the remaining couple of bytes > with WriteFile(). An appropriate thing to do here would be to flush the > buffer, but put the few remaining bytes into the buffer, instead of writing > them to disk and thus making an unnecessary syscall. > > With all this in mind, I will prepare the V2 version of the core patch. I have attached the V2 version of the core patch. To solve the aforementioned oversight in the V1 patch, I implemented the optimization in a slightly different manner, by keeping the existing loop but also handling a condition where we can write the remaining data with a single WriteFile() call. Apart from this, the V2 patch also includes an additional test, test_small_and_large_writes_buffered(). Speaking of the discussed idea of adjusting the strategy to avoid a memcpy() with the write pattern like WriteFile(13) WriteFile(86267) ... instead of WriteFile(4096) WriteFile(82185) ... I preferred to keep the latter approach which keeps the minimum size of the WriteFile() chunk equal to 4096, for the following reasons: - My benchmarks do not show that the alternative pattern that also avoids a memcpy() results in a quantifiable performance improvement. - The existing code had a property that all WriteFile() calls, except for maybe the last one, were performed in chunks with sizes that are never less than 4096. Although I can't reproduce this in my environment, it could be that introducing a possibility of writing in smaller chunks would result in the decreased performance with specific hardware, non- file handles or in situations when the OS write caching is disabled. Therefore, currently, I think that it would be better to keep this existing property. - Probably, implementing the first approach would result in a bit more complex code, as I think that it would require introducing an additional if-else code path. The log message is included in the beginning of the patch file. Thanks, Evgeny Kotkov Win32: Improve apr_file_write() performance on buffered files by reducing the amount of WriteFile() calls for large writes. Previously, writing has been implemented with a loop that keeps copying the data to the internal 4KB buffer and writing this buffer to disk by calling WriteFile(4096). This patch reduces the amount of syscalls for large writes by performing them with a single syscall, if possible. If the buffer is not empty at the moment when the large write occurs, it is first filled up to its 4KB capacity, flushed, and the remaining part of the data is written with a single syscall. * file_io/win32/readwrite.c (write_buffered): Within the write loop, check if we have a situation with an empty buffer and a large chunk pending to be written. In this case, bypass the buffering and write the remaining chunk with a single syscall. Return an appropriate number of written bytes to satisfy the apr_file_write() function contract. (apr_file_write): Adjust call to write_buffered(). * test/testfile.c (test_large_write_buffered, test_two_large_writes_buffered, test_small_and_large_writes_buffered, test_write_buffered_spanning_over_bufsize): New tests. (testfile): Run the new tests. Patch by: Evgeny Kotkov Index: file_io/win32/readwrite.c === --- file_io/win32/readwrite.c (revision 1805607) +++ file_io/win32/readwrite.c (working copy) @@ -290,9 +290,8 @@ static apr_status_t write_helper(HANDLE filehand, } static apr_status_t write_buffered(apr_file_t *thefile, const char *buf, - apr_size_t len) + apr_size_t len, apr_size_t *pwritten) { -apr_size_t blocksize; apr_status_t rv; if (thefile->direction == 0) { @@ -306,20 +305,41 @@ static apr_status_t write_buffered(apr_file_t *the thefile->direction = 1; } -rv = 0; -while (rv == 0 && len > 0) { -if (thefile->bufpos == thefile->bufsize) /* write buffer is full */ +*pwritten = 0; + +while (len > 0) { +if (thefile->bufpos == thefile->bufsize) { /* write buffer is full */ rv = apr_file_flush(thefile); +if (rv) { +return rv; +} +} +/* If our buffer is empty, and we cannot fit the remaining chunk + * into it, write the chunk with a single syscall and return. + */ +if (thefile->bufpos == 0 && len > thefile->bufsize) { +
Re: svn commit: r1805897 - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
Stefan Sperling writes: > \o/ > > Can you please update the 1.10 release notes accordingly? Will do, thanks for the reminder. Regards, Evgeny Kotkov