Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
On 19 Jan 2023, Evgeny Kotkov wrote: To have a more or less accurate estimate, I went ahead and prepared the first-cut implementation of an approach that makes the pristine checksum kind configurable in a working copy. The current implementation passes all tests in my environment and seems to work in practice. It is available on the branch: https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind The implementation on the branch allows creating working copies that use a checksum kind other than SHA-1. The checksum kind is persisted in the settings table. Upgraded working copies of the older formats will have SHA-1 recorded as their pristine checksum kind and will continue to use it for compatibility. Newly created working copies of the latest format (with --compatible-version=1.15 or --store-pristine=no), as currently implemented, will use the new pristine checksum kind. Currently, as a proof-of-concept, the branch uses salted SHA-1 as the new pristine checksum kind. For the production-ready state, I plan to support using multiple new checksum types such as SHA-256. I think that it would be useful for future compatibility, because if we encounter any issues with one checksum kind, we could then switch to a different kind without having to change the working copy format. One thing worth noting is that ra_serf contains a specific optimization for the skelta-style updates that allows skipping a GET request if the pristine store already contains an entry with the specified SHA-1 checksum. Switching to a different checksum type for the pristine entries is going to disable that specific optimization. Re-enabling it would require an update of the server-side. I consider this to be out of scope for this branch. I can complete the work on this branch and bring it to a production-ready state, assuming there are no objections. This sounds great to me; thank you, Evgeny. I agree that the server-side companion change is (or anyway can be) out-of-scope here -- the perfect should not be the enemy of the good, etc. Best regards, -Karl
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
On 19 Jan 2023, Daniel Shahaf wrote: https://subversion.apache.org/security/sha1-advisory.txt That's a well-written advisory. I was surprised to see that there is no date on it, though -- from looking at the page, one would have no quick way of knowing the date it was published (although one would know that it must have been published 2017 or after, since it references events in 2017). Best regards, -Karl
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Karl Fogel writes: > Now, how hard would this be to actually implement? To have a more or less accurate estimate, I went ahead and prepared the first-cut implementation of an approach that makes the pristine checksum kind configurable in a working copy. The current implementation passes all tests in my environment and seems to work in practice. It is available on the branch: https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind The implementation on the branch allows creating working copies that use a checksum kind other than SHA-1. The checksum kind is persisted in the settings table. Upgraded working copies of the older formats will have SHA-1 recorded as their pristine checksum kind and will continue to use it for compatibility. Newly created working copies of the latest format (with --compatible-version=1.15 or --store-pristine=no), as currently implemented, will use the new pristine checksum kind. Currently, as a proof-of-concept, the branch uses salted SHA-1 as the new pristine checksum kind. For the production-ready state, I plan to support using multiple new checksum types such as SHA-256. I think that it would be useful for future compatibility, because if we encounter any issues with one checksum kind, we could then switch to a different kind without having to change the working copy format. One thing worth noting is that ra_serf contains a specific optimization for the skelta-style updates that allows skipping a GET request if the pristine store already contains an entry with the specified SHA-1 checksum. Switching to a different checksum type for the pristine entries is going to disable that specific optimization. Re-enabling it would require an update of the server-side. I consider this to be out of scope for this branch. I can complete the work on this branch and bring it to a production-ready state, assuming there are no objections. Thanks, Evgeny Kotkov
Re: Escape sequences in log messages [etc]
Nathan Hartman wrote on Wed, Jan 18, 2023 at 01:10:47 -0500: > On Tue, Jan 17, 2023 at 3:02 PM Doug Robinson > wrote: > > > Daniel, et. al.: > > > > On Mon, Jan 2, 2023 at 5:14 PM Daniel Sahlberg < > > daniel.l.sahlb...@gmail.com> wrote: > > > >> In a thread started by Vincent Lefevre in October [1] it was noted that > >> Subversion prints several pieces of information from the repository to the > >> terminal (including log messages and author names) without considering if > >> they may affect terminal behaviour. > >> > >> As demonstrated by DanielSh [2] a user may inject escape sequences into a > >> log message and when running svn log, these affect terminal color. Git > >> behaves the same way, as demonstrated by me [3]. > >> > > > > Any idea what Git is going to do with this? > > > > > Unless someone reports (reported?) it to the Git devs, it's possible they > aren't aware of it. > > If we want to do something about it on our end, it might make sense to > coordinate with the Git devs so that both systems could have similar > behavior. > > But... I'm not sure whether we want to do anything yet, partly because... > > > Can we reach consensus if this behaviour is intended, unintended but > >> desirable or unintended and undesirable? I would value the opinions of the > >> oldtimers who might have background information if this was ever discussed > >> or considered in the early days. > >> > >> In the original thread there were several arguments both pro and con > >> regarding filtering/quoting escape sequences. > >> > > > > From my perspective trying to do anything about this is opening up a huge > > investigation that may result in incompatible-with-history choices. > > > > 1. What about "svn diff" ? (any modifications here could break "patch", > > et. al.) > > 2. What about "svn cat" ? > > 3. What about properties? (I just verified you can place escape sequences > > in them). > > ... > > > > (I doubt my list above is complete.) > > > > > ...of concerns that doing so will break stuff. > We have precedents for making breaking changes: we make them in an A.B.0 release if possible, and document them in the release notes and/or in notes/api-errata/. > > A "complete" implementation of a "feature" to mask/protect-against escape > > sequences is also going to need an option to enable the raw output > > (including the escape sequences) for every command/context where they could > > be coming out today. Not necessarily: If the lack of escaping may be considered a bugfix, then we don't have to offer a backwards-compatible upgrade path. The values of log messages are required to be UTF-8 strings. When ViewVC renders them, it escapes characters that are special to HTML (angle brackets and ampersand). If someone out there has a C source code generator that takes svn log messages as input, that generator should escape double quotes, backslashes, and so on when it emits C string literals derived from log messages. And when the cmdline client emits svn:log property values to a terminal, it should escape the sequences as appropriate for that terminal. And we needn't support an option to emit escape sequences in raw form for the same reason that ViewVC doesn't have an option to emit log messages into the HTTP response stream without HTML escaping. Cheers, Daniel
Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format
Karl Fogel wrote on Thu, Dec 29, 2022 at 17:35:44 -0600: > On 29 Dec 2022, Evgeny Kotkov wrote: > > Karl Fogel writes: > > > > > Now, how hard would this be to actually implement? > > > > I plan to take a more detailed look at that, but I'm currently on > > vacation for the New Year holidays. > > That's great to hear, Evgeny. In the meantime, enjoy your vacation! Any news on this? Over here it's still not clear to me why what problem would be solved by switching away from SHA-1, what alternative solutions to that problem have been considered, and whether anyone has actually stopped to consider /both/ the pros and cons of switching away from SHA-1. Karl Fogel wrote on Wed, Dec 28, 2022 at 09:10:31 -0400: > On 28 Dec 2022, Daniel Sahlberg wrote: > > Since we need to be backwards compatible with older v1 clients, can > > this check ever be removed (before Subversion 2)? > > > > So, while I believe f32 is a good opportunity to switch to a new > > hash, what is the problem we would like to solve with a new hash? > > As I said before, even if we couldn't think of a concrete problem right now, > the mere fact that a former guarantee [1] has become a non-guarantee is > enough motivation. We can't anticipate all the problems that might arise > from people being able to craft local content that looks unmodified to > Subversion. (As you implied, r1794611 has no effect for content that is > never committed to the repository.) > > Of course, my saying "This matters just through reasoning from first > principles, therefore we should fix it" would count for a lot more if I were > volunteering to fix it, which I'm not alas. But I do think we don't need to > search further for justifications. What we already know is enough: our hash > algorithm is known to be collidable, yet what we're using it for depends on > non-collidability; therefore, switching to a better algorithm is a good > idea. > Agreed that we shouldn't limit ourselves to problems/attacks we can imagine. However, it does not follow from "the mere fact that a former guarantee has become a non-guarantee" that we should switch the checksum algorithm. What does folow from that is that we should review our design, identify the places that depend on the no-longer-valid guarantee, assess the implications for each of them, and then determine what sort of changes may be needed. In other words, we should do what we do whenever we write an advisory. Which reminds me: https://subversion.apache.org/security/sha1-advisory.txt Daniel > However, it needn't be a blocker for the next release, for the reason Brane > gave. > > Best regards, > -Karl > > [1] "Former guarantee" meaning "former guarantee for all practical > purposes", of course, since in the past there weren't ways to make > collisions happen.