Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-20 Thread Jeff King
On Tue, Dec 20, 2016 at 09:50:42AM +0100, SZEDER Gábor wrote:

> > It just seems like the whole thing would conceptually easier if we
> > pre-parsed the versions into a sequence of elements, then the comparison
> > between any two elements would just walk that sequence. The benefit
> > there is that you can implement whatever rules you like for the parsing
> > (like "prefer longer suffixes to shorter"), but you know the comparison
> > will always be consistent.
> 
> I considered parsing tagnames into prefix, version number and suffix,
> and then work from that, but decided against it.
> 
> versioncmp() is taken from glibc, so I assume that it's thoroughly
> tested, even in corner cases (e.g. multiple leading zeros).
> Furthermore, I think it's a good thing that by default (i.e. without
> suffix reordering) our version sort orders the same way as glibc's
> version sort does.  Introducing a different algorithm would risk bugs
> in the more subtle cases.

Fair enough. If it's working, I agree there is risk in changing things.
And if you're willing to deal with the bugs, then I'm happy to stand
back. :)

> Then there are all the weird release suffixes out there, and I didn't
> want to decide on a policy for splitting them sanely; don't know
> whether there exist any universal rules for this splitting at
> all.  E.g. one of the packages here has the following version (let's
> ignore the fact that because of the '~' this is an invalid refname in
> git):

I have a hunch that any policy you'd have to set for splitting is going
to end up becoming a policy you'll have to use when comparing (or you
risk violating the transitivity of your comparison function).

But that's just a hunch, not a proof. Again, I'm happy to defer to you
if you're the one working on it.

-Peff


Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-20 Thread SZEDER Gábor
On Wed, Dec 14, 2016 at 6:08 PM, Jeff King  wrote:
> On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote:
>
>> > With my patches it looks like this:
>> >
>> > $ git -c versionsort.prereleasesuffix=-pre \
>> >   -c versionsort.prereleasesuffix=-prerelease \
>> >   tag -l --sort=version:refname
>> > v1.0.0-prerelease1
>> > v1.0.0-pre1
>> > v1.0.0-pre2
>> > v1.0.0
>>
>> And when there happen to be more than one matching suffixes starting
>> at the same earliest position, then we should pick the longest of
>> them.  The new patch 6/7 implements this behavior.
>
> The whole approach taken by the suffix code (before your patches) leaves
> me with the nagging feeling that the comparison is not always going to
> be transitive (i.e., that "a < b && b < c" does not always imply "a <
> c"), which is going to cause nonsensical sorting results.
>
> And that may be part of the issue your 6/7 fixes.
>
> I spent some time playing with this the other day, though, and couldn't
> come up with a specific example that fails the condition above.
>
> It just seems like the whole thing would conceptually easier if we
> pre-parsed the versions into a sequence of elements, then the comparison
> between any two elements would just walk that sequence. The benefit
> there is that you can implement whatever rules you like for the parsing
> (like "prefer longer suffixes to shorter"), but you know the comparison
> will always be consistent.

I considered parsing tagnames into prefix, version number and suffix,
and then work from that, but decided against it.

versioncmp() is taken from glibc, so I assume that it's thoroughly
tested, even in corner cases (e.g. multiple leading zeros).
Furthermore, I think it's a good thing that by default (i.e. without
suffix reordering) our version sort orders the same way as glibc's
version sort does.  Introducing a different algorithm would risk bugs
in the more subtle cases.

Then there are all the weird release suffixes out there, and I didn't
want to decide on a policy for splitting them sanely; don't know
whether there exist any universal rules for this splitting at
all.  E.g. one of the packages here has the following version (let's
ignore the fact that because of the '~' this is an invalid refname in
git):

  1.1.0~rc1-2ubuntu7-1linuxmint1

Now, it's clear that the version number is "1.1.0", and the user
should configure the suffix "~rc" for prerelease reordering.  But what
about the rest?  How should we split it "into a sequence of elements",
is it { "1.1.0", "~rc1", "-2ubuntu7", "-1linuxmint1" } or { "1.1.0",
"~rc1-2", "ubuntu7-1", "linuxmint1" }?
What if there is a hard-working developer who is involved in a lot of
Debian derivatives (and derivatives of derivatives...), and, for
whatever reason, wants to put derivative-specific versions in a
particular order?  With my series, or conceptually even with master if
it weren't buggy, it's possible to specify the order of suffixes of
suffixes, and that dev could do this:

  $ git -c versionsort.suffix=-rc
-c versionsort.suffix=linuxmint
-c versionsort.suffix=YADoaDD
tag -l --sort=version:refname
  '1.1.0*'
  1.1.0-rc1-2ubuntu7-1linuxmint1
  1.1.0-rc1-2ubuntu7-1YADoaDD2
  1.1.0
  1.1.0-2ubuntu7-1linuxmint1
  1.1.0-2ubuntu7-1YADoaDD2

and would get Linux Mint-specific tags before "Yet Another Derivative
of a Debian Derivative"-specific ones.  Not sure whether this is
relevant in practice, but I think it's a nice property nonetheless.

(Btw, just for fun, I also found a package version
2.0.0~beta2+isreally1.8.6-0ubuntu1.  "isreally".  Oh yeah :)

> It would also be more efficient, I think (it seems like the sort is
> O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the
> comparator). Though that probably doesn't matter much in practice.

I don't think there will be more than only a few configured suffixes
in any repository.  However, if you consider O as "number of
starts_with() invocations", then there is an additional suffix_length
factor.  But then again, these suffixes tend to be short.

> I dunno. I think maybe your 6/7 has converged on an equivalent behavior.
> And I am certainly not volunteering to re-write it, so if what you have
> works, I'm not opposed to it.
>
> -Peff


Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-14 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote:
>
>> > With my patches it looks like this:
>> > 
>> > $ git -c versionsort.prereleasesuffix=-pre \
>> >   -c versionsort.prereleasesuffix=-prerelease \
>> >   tag -l --sort=version:refname
>> > v1.0.0-prerelease1
>> > v1.0.0-pre1
>> > v1.0.0-pre2
>> > v1.0.0
>> 
>> And when there happen to be more than one matching suffixes starting
>> at the same earliest position, then we should pick the longest of
>> them.  The new patch 6/7 implements this behavior.
>
> The whole approach taken by the suffix code (before your patches) leaves
> me with the nagging feeling that the comparison is not always going to
> be transitive (i.e., that "a < b && b < c" does not always imply "a <
> c"), which is going to cause nonsensical sorting results.
>
> And that may be part of the issue your 6/7 fixes.
>
> I spent some time playing with this the other day, though, and couldn't
> come up with a specific example that fails the condition above.
>
> It just seems like the whole thing would conceptually easier if we
> pre-parsed the versions into a sequence of elements, then the comparison
> between any two elements would just walk that sequence. The benefit
> there is that you can implement whatever rules you like for the parsing
> (like "prefer longer suffixes to shorter"), but you know the comparison
> will always be consistent.
>
> It would also be more efficient, I think (it seems like the sort is
> O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the
> comparator). Though that probably doesn't matter much in practice.
>
> I dunno. I think maybe your 6/7 has converged on an equivalent behavior.
> And I am certainly not volunteering to re-write it, so if what you have
> works, I'm not opposed to it.

I also had worries about transitiveness but couldn't break it after
trying for some time.  I find your pre-parsing suggestion a great
one, not from the point of view of performance, but because I would
imagine that the resulting logic would become a lot easier to
understand.




Re: [PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-14 Thread Jeff King
On Thu, Dec 08, 2016 at 03:23:54PM +0100, SZEDER Gábor wrote:

> > With my patches it looks like this:
> > 
> > $ git -c versionsort.prereleasesuffix=-pre \
> >   -c versionsort.prereleasesuffix=-prerelease \
> >   tag -l --sort=version:refname
> > v1.0.0-prerelease1
> > v1.0.0-pre1
> > v1.0.0-pre2
> > v1.0.0
> 
> And when there happen to be more than one matching suffixes starting
> at the same earliest position, then we should pick the longest of
> them.  The new patch 6/7 implements this behavior.

The whole approach taken by the suffix code (before your patches) leaves
me with the nagging feeling that the comparison is not always going to
be transitive (i.e., that "a < b && b < c" does not always imply "a <
c"), which is going to cause nonsensical sorting results.

And that may be part of the issue your 6/7 fixes.

I spent some time playing with this the other day, though, and couldn't
come up with a specific example that fails the condition above.

It just seems like the whole thing would conceptually easier if we
pre-parsed the versions into a sequence of elements, then the comparison
between any two elements would just walk that sequence. The benefit
there is that you can implement whatever rules you like for the parsing
(like "prefer longer suffixes to shorter"), but you know the comparison
will always be consistent.

It would also be more efficient, I think (it seems like the sort is
O(nr_tags * lg(nr_tags) * nr_suffixes) due to parsing suffixes in the
comparator). Though that probably doesn't matter much in practice.

I dunno. I think maybe your 6/7 has converged on an equivalent behavior.
And I am certainly not volunteering to re-write it, so if what you have
works, I'm not opposed to it.

-Peff


[PATCHv2 0/7] Fix and generalize version sort reordering

2016-12-08 Thread SZEDER Gábor
> After having slept on this a couple of times

After having slept on it a few (erm...) more times...

> I think the longest
> matching prerelease suffix should determine the sorting order.
> 
> A release tag usually consists of an optional prefix, e.g. 'v' or
> 'snapshot-', followed by the actual version number, followed by an
> optional suffix.  In the contrived example quoted above this suffix
> is '-foo-bar', thus it feels wrong to match '-bar' against it, when
> the user explicitly configured '-foo-bar' as prerelease suffix as
> well.

At the risk of overthinking it: I think it would be more correct to
say that the earliest starting matching prerelease suffix should
determine the sorting order in the first place.

> Then here is a more realistic case for sorting based on the longest
> matching suffix, where
> 
>- a longer suffix starts with the shorter one,
>- and the longer suffix comes after the shorter one in the
>  configuration.
> 
> With my patches it looks like this:
> 
> $ git -c versionsort.prereleasesuffix=-pre \
>   -c versionsort.prereleasesuffix=-prerelease \
>   tag -l --sort=version:refname
> v1.0.0-prerelease1
> v1.0.0-pre1
> v1.0.0-pre2
> v1.0.0

And when there happen to be more than one matching suffixes starting
at the same earliest position, then we should pick the longest of
them.  The new patch 6/7 implements this behavior.

Changes since v1:
 - Documentation, in-code comment and commit message updates.
 - Use different tagnames and suffixes in the tests, keeping the new
   tests simpler and changes to existing tests smaller.
 - Added a test of questionable value to try to check that we don't
   read memory before the tagname strings in case of an unusually long
   suffix.
 - Removed unnecessary {}.
 - Added two patches on top to address the corner cases and to
   introduce 'versionsort.suffix' for the resulting more general
   suffix reordering behavior.

The interdiff at the bottom is between v1 and only the first five
patches of v2, i.e. not including the two new patches.  I think it's
easier to judge the changes this way.

SZEDER Gábor (7):
  t7004-tag: delete unnecessary tags with test_when_finished
  t7004-tag: use test_config helper
  t7004-tag: add version sort tests to show prerelease reordering issues
  versioncmp: pass full tagnames to swap_prereleases()
  versioncmp: cope with common part overlapping with prerelease suffix
  versioncmp: use earliest-longest contained suffix to determine sorting
order
  versioncmp: generalize version sort suffix reordering

 Documentation/config.txt  |  44 
 Documentation/git-tag.txt |   4 +-
 t/t7004-tag.sh| 132 +++---
 versioncmp.c  |  68 +---
 4 files changed, 196 insertions(+), 52 deletions(-)

-- 
2.11.0.78.g5a2d011


diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0bcb6790d..c1a9616e9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3008,8 +3008,12 @@ versionsort.prereleaseSuffix::
 This variable can be specified multiple times, once per suffix. The
 order of suffixes in the config file determines the sorting order
 (e.g. if "-pre" appears before "-rc" in the config file then 1.0-preXX
-is sorted before 1.0-rcXX). The sorting order between different
-suffixes is undefined if they are in multiple config files.
+is sorted before 1.0-rcXX).
+If more than one suffixes match the same tagname, then that tagname will
+be sorted according to the matching suffix which comes first in the
+configuration.
+The sorting order between different suffixes is undefined if they are
+in multiple config files.
 
 web.browser::
Specify a web browser that may be used by some commands.
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index f0cfe1fa3..c7aaace8c 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1511,14 +1511,14 @@ test_expect_success 'invalid sort parameter in 
configuratoin' '
 '
 
 test_expect_success 'version sort with prerelease reordering' '
-   test_config versionsort.prereleaseSuffix -beta &&
-   git tag foo1.6-beta1 &&
-   git tag foo1.6-beta2 &&
+   test_config versionsort.prereleaseSuffix -rc &&
+   git tag foo1.6-rc1 &&
+   git tag foo1.6-rc2 &&
git tag -l --sort=version:refname "foo*" >actual &&
cat >expect <<-\EOF &&
foo1.3
-   foo1.6-beta1
-   foo1.6-beta2
+   foo1.6-rc1
+   foo1.6-rc2
foo1.6
foo1.10
EOF
@@ -1526,54 +1526,49 @@ test_expect_success 'version sort with prerelease 
reordering' '
 '
 
 test_expect_success 'reverse version sort with prerelease reordering' '
-   test_config versionsort.prereleaseSuffix -beta &&
+   test_config versionsort.prereleaseSuffix -rc &&
git tag -l --sort=-version:refname "foo*" >actual &&
cat >expect <<-\EOF &&
foo1.10
foo1.6
-   foo1.6-beta2
-