[pacman-dev] Version comparison algorithm.

2019-03-16 Thread Richard Dodd
I'm writing to ask about the version comparison function `rpmvercmp` in
`libalpm/version.cL83`. The algorithm is complex and I'm trying to
understand its behavior. I want to hash using the package name and version
as a key, and so I need a hash function where `i1 == i2 => hash(i1) ==
hash(i2)` according to the version comparison operation. I'll describe how
the algorithm behaves and then ask my questions.

The algorithm works on a byte string and uses ascii comparison rules (no
unicode).

 - First, split the input up into blocks of *alpha*, *digit* or
*non-alphanum*.
 - For each pair of blocks
- If the types are different, then *non-alphanum* is newer than
*numeric*, which is newer than *alpha*
- If the types are the same, then the rules are
  - For *non-alphanum*, compare lengths, longer is newer, equal lengths
are equal segments (so *--* and *::* are the same)
  - For *alpha* just do a lexicographic comparison (so *b* is newer
than *a* etc.)
  - For *numeric*, do a numeric comparison. (this can be done by
skipping leading zeros, then comparing length, then comparing
lexicographically, to avoid overflow on integer conversion)
  - If one input is longer than the other, and all sections so far have
been equal, then if the next section of the longer is *alpha*, it is older,
and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is
newer than "a").
  - If the inputs have the same number of sections that are all equal, then
they are equal.

Questions:

 1. Is the algorithm correctly described here.
 2. This should mean that if I hash length for *non-numeric*, the string
with stripped zeros for *numeric*, and just the string for *alpha*, the has
law should be upheld. Is this correct?

Thanks
Rich


Re: [pacman-dev] Version comparison algorithm.

2019-03-20 Thread Richard Dodd
I'm just trying to get a mathematically sound implementation of all the
properties (traits in rust parlance) that the version exhibits. If there is
an equivalence relation and a hash function as above, then I can store
versions in a hash set/map. If there is a total ordering, then I can store
versions in a binary tree set/map. I'm making a library, so I want to allow
as much as possible without allowing anything unsound. Rust's type system
allows me to make these promises in code, rather than just documenting them.

You can see my implementation at
https://github.com/derekdreery/alpm-experimental/blob/master/src/version.rs,
if you're interested.


On Sat, Mar 16, 2019 at 2:02 PM Richard Dodd 
wrote:

> I'm writing to ask about the version comparison function `rpmvercmp` in
> `libalpm/version.cL83`. The algorithm is complex and I'm trying to
> understand its behavior. I want to hash using the package name and version
> as a key, and so I need a hash function where `i1 == i2 => hash(i1) ==
> hash(i2)` according to the version comparison operation. I'll describe how
> the algorithm behaves and then ask my questions.
>
> The algorithm works on a byte string and uses ascii comparison rules (no
> unicode).
>
>  - First, split the input up into blocks of *alpha*, *digit* or
> *non-alphanum*.
>  - For each pair of blocks
> - If the types are different, then *non-alphanum* is newer than
> *numeric*, which is newer than *alpha*
> - If the types are the same, then the rules are
>   - For *non-alphanum*, compare lengths, longer is newer, equal
> lengths are equal segments (so *--* and *::* are the same)
>   - For *alpha* just do a lexicographic comparison (so *b* is newer
> than *a* etc.)
>   - For *numeric*, do a numeric comparison. (this can be done by
> skipping leading zeros, then comparing length, then comparing
> lexicographically, to avoid overflow on integer conversion)
>   - If one input is longer than the other, and all sections so far have
> been equal, then if the next section of the longer is *alpha*, it is older,
> and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is
> newer than "a").
>   - If the inputs have the same number of sections that are all equal,
> then they are equal.
>
> Questions:
>
>  1. Is the algorithm correctly described here.
>  2. This should mean that if I hash length for *non-numeric*, the string
> with stripped zeros for *numeric*, and just the string for *alpha*, the has
> law should be upheld. Is this correct?
>
> Thanks
> Rich
>
>
>