Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Karl Fogel

On 19 Jan 2023, Evgeny Kotkov wrote:
To have a more or less accurate estimate, I went ahead and 
prepared the
first-cut implementation of an approach that makes the pristine 
checksum

kind configurable in a working copy.

The current implementation passes all tests in my environment and 
seems to

work in practice.  It is available on the branch:

 https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind

The implementation on the branch allows creating working copies 
that use a

checksum kind other than SHA-1.

The checksum kind is persisted in the settings table.  Upgraded 
working copies
of the older formats will have SHA-1 recorded as their pristine 
checksum kind
and will continue to use it for compatibility.  Newly created 
working copies
of the latest format (with --compatible-version=1.15 or 
--store-pristine=no),
as currently implemented, will use the new pristine checksum 
kind.


Currently, as a proof-of-concept, the branch uses salted SHA-1 as 
the new
pristine checksum kind.  For the production-ready state, I plan 
to support
using multiple new checksum types such as SHA-256.  I think that 
it would
be useful for future compatibility, because if we encounter any 
issues with
one checksum kind, we could then switch to a different kind 
without having

to change the working copy format.

One thing worth noting is that ra_serf contains a specific 
optimization for
the skelta-style updates that allows skipping a GET request if 
the pristine
store already contains an entry with the specified SHA-1 
checksum.  Switching
to a different checksum type for the pristine entries is going to 
disable
that specific optimization.  Re-enabling it would require an 
update of the

server-side.  I consider this to be out of scope for this branch.

I can complete the work on this branch and bring it to a 
production-ready

state, assuming there are no objections.


This sounds great to me; thank you, Evgeny.  I agree that the 
server-side companion change is (or anyway can be) out-of-scope 
here -- the perfect should not be the enemy of the good, etc.


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Karl Fogel

On 19 Jan 2023, Daniel Shahaf wrote:

https://subversion.apache.org/security/sha1-advisory.txt


That's a well-written advisory.  I was surprised to see that there 
is no date on it, though -- from looking at the page, one would 
have no quick way of knowing the date it was published (although 
one would know that it must have been published 2017 or after, 
since it references events in 2017).


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Now, how hard would this be to actually implement?

To have a more or less accurate estimate, I went ahead and prepared the
first-cut implementation of an approach that makes the pristine checksum
kind configurable in a working copy.

The current implementation passes all tests in my environment and seems to
work in practice.  It is available on the branch:

  https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind

The implementation on the branch allows creating working copies that use a
checksum kind other than SHA-1.

The checksum kind is persisted in the settings table.  Upgraded working copies
of the older formats will have SHA-1 recorded as their pristine checksum kind
and will continue to use it for compatibility.  Newly created working copies
of the latest format (with --compatible-version=1.15 or --store-pristine=no),
as currently implemented, will use the new pristine checksum kind.

Currently, as a proof-of-concept, the branch uses salted SHA-1 as the new
pristine checksum kind.  For the production-ready state, I plan to support
using multiple new checksum types such as SHA-256.  I think that it would
be useful for future compatibility, because if we encounter any issues with
one checksum kind, we could then switch to a different kind without having
to change the working copy format.

One thing worth noting is that ra_serf contains a specific optimization for
the skelta-style updates that allows skipping a GET request if the pristine
store already contains an entry with the specified SHA-1 checksum.  Switching
to a different checksum type for the pristine entries is going to disable
that specific optimization.  Re-enabling it would require an update of the
server-side.  I consider this to be out of scope for this branch.

I can complete the work on this branch and bring it to a production-ready
state, assuming there are no objections.


Thanks,
Evgeny Kotkov


Re: Escape sequences in log messages [etc]

2023-01-19 Thread Daniel Shahaf
Nathan Hartman wrote on Wed, Jan 18, 2023 at 01:10:47 -0500:
> On Tue, Jan 17, 2023 at 3:02 PM Doug Robinson 
> wrote:
> 
> > Daniel, et. al.:
> >
> > On Mon, Jan 2, 2023 at 5:14 PM Daniel Sahlberg <
> > daniel.l.sahlb...@gmail.com> wrote:
> >
> >> In a thread started by Vincent Lefevre in October [1] it was noted that
> >> Subversion prints several pieces of information from the repository to the
> >> terminal (including log messages and author names) without considering if
> >> they may affect terminal behaviour.
> >>
> >> As demonstrated by DanielSh [2] a user may inject escape sequences into a
> >> log message and when running svn log, these affect terminal color. Git
> >> behaves the same way, as demonstrated by me [3].
> >>
> >
> > Any idea what Git is going to do with this?
> >
> 
> 
> Unless someone reports (reported?) it to the Git devs, it's possible they
> aren't aware of it.
> 
> If we want to do something about it on our end, it might make sense to
> coordinate with the Git devs so that both systems could have similar
> behavior.
> 
> But... I'm not sure whether we want to do anything yet, partly because...
> 
> 
> Can we reach consensus if this behaviour is intended, unintended but
> >> desirable or unintended and undesirable? I would value the opinions of the
> >> oldtimers who might have background information if this was ever discussed
> >> or considered in the early days.
> >>
> >> In the original thread there were several arguments both pro and con
> >> regarding filtering/quoting escape sequences.
> >>
> >
> > From my perspective trying to do anything about this is opening up a huge
> > investigation that may result in incompatible-with-history choices.
> >
> > 1. What about "svn diff" ?  (any modifications here could break "patch",
> > et. al.)
> > 2. What about "svn cat" ?
> > 3. What about properties?  (I just verified you can place escape sequences
> > in them).
> > ...
> >
> > (I doubt my list above is complete.)
> >
> 
> 
> ...of concerns that doing so will break stuff.
> 

We have precedents for making breaking changes: we make them in an A.B.0
release if possible, and document them in the release notes and/or in
notes/api-errata/.

> > A "complete" implementation of a "feature" to mask/protect-against escape
> > sequences is also going to need an option to enable the raw output
> > (including the escape sequences) for every command/context where they could
> > be coming out today.

Not necessarily: If the lack of escaping may be considered a bugfix,
then we don't have to offer a backwards-compatible upgrade path.

The values of log messages are required to be UTF-8 strings.  When
ViewVC renders them, it escapes characters that are special to HTML
(angle brackets and ampersand).  If someone out there has a C source
code generator that takes svn log messages as input, that generator
should escape double quotes, backslashes, and so on when it emits C
string literals derived from log messages.  And when the cmdline client
emits svn:log property values to a terminal, it should escape the
sequences as appropriate for that terminal.

And we needn't support an option to emit escape sequences in raw form
for the same reason that ViewVC doesn't have an option to emit log
messages into the HTTP response stream without HTML escaping.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Daniel Shahaf
Karl Fogel wrote on Thu, Dec 29, 2022 at 17:35:44 -0600:
> On 29 Dec 2022, Evgeny Kotkov wrote:
> > Karl Fogel  writes:
> > 
> > > Now, how hard would this be to actually implement?
> > 
> > I plan to take a more detailed look at that, but I'm currently on
> > vacation for the New Year holidays.
> 
> That's great to hear, Evgeny.  In the meantime, enjoy your vacation!

Any news on this?  Over here it's still not clear to me why what problem
would be solved by switching away from SHA-1, what alternative solutions
to that problem have been considered, and whether anyone has actually
stopped to consider /both/ the pros and cons of switching away from SHA-1.

Karl Fogel wrote on Wed, Dec 28, 2022 at 09:10:31 -0400:
> On 28 Dec 2022, Daniel Sahlberg wrote:
> > Since we need to be backwards compatible with older v1 clients, can
> > this check ever be removed (before Subversion 2)?
> > 
> > So, while I believe f32 is a good opportunity to switch to a new
> > hash, what is the problem we would like to solve with a new hash?
> 
> As I said before, even if we couldn't think of a concrete problem right now,
> the mere fact that a former guarantee [1] has become a non-guarantee is
> enough motivation.  We can't anticipate all the problems that might arise
> from people being able to craft local content that looks unmodified to
> Subversion.  (As you implied, r1794611 has no effect for content that is
> never committed to the repository.)
> 
> Of course, my saying "This matters just through reasoning from first
> principles, therefore we should fix it" would count for a lot more if I were
> volunteering to fix it, which I'm not alas. But I do think we don't need to
> search further for justifications. What we already know is enough: our hash
> algorithm is known to be collidable, yet what we're using it for depends on
> non-collidability; therefore, switching to a better algorithm is a good
> idea.
> 

Agreed that we shouldn't limit ourselves to problems/attacks we can
imagine.

However, it does not follow from "the mere fact that a former guarantee
has become a non-guarantee" that we should switch the checksum
algorithm.  What does folow from that is that we should review our
design, identify the places that depend on the no-longer-valid
guarantee, assess the implications for each of them, and then determine
what sort of changes may be needed.

In other words, we should do what we do whenever we write an advisory.

Which reminds me:

https://subversion.apache.org/security/sha1-advisory.txt

Daniel

> However, it needn't be a blocker for the next release, for the reason Brane
> gave.
> 
> Best regards,
> -Karl
> 
> [1] "Former guarantee" meaning "former guarantee for all practical
> purposes", of course, since in the past there weren't ways to make
> collisions happen.