While I've been looking at replacement/improvement for the current upset
script, I've come across some minor issues related to
under-specification or under-documentation of setup.hint:
* The encoding of setup.hint is unspecified.
Historically both ISO-8859-1 and UTF-8 have been used. (e.g. libspiro
used 'bézier' with an ISO-8859-1 e-acute, whereas calligra-l10n-nb uses
'Bokmål' with an UTF-8 a-ring. Various other hints use UTF-8 punctuation
marks)
I think currently UTF-8 displays correctly in the HTML package pages,
but neither encoding displays correctly in setup.
I'd suggest that we specify UTF-8 and eventually fix setup to handle that.
* 'sdesc' text is mangled in setup.ini (but not the HTML package list)
In particular, it is forced to start with a capital letter (which is
incorrect when the sdesc starts with a command name which is properly
lower-case, e.g. "dash shell", etc.), and any text up to and including
the first colon is removed, presumably in an effort to prevent people
writing the package name again, (which mangles perl and ruby module
names in the description, e.g. "Ruby Net::HTTP persistent connection
support", ""Perl Math::Int64 distribution", etc.)
I'd suggest this mangling is removed, and sdesc starting "packagename:"
is explicitly reported.
* Handling of double-quoted text seems over-complicated
A multi-line double-quoted value is terminated only by a double-quote at
the end of the line, and embedded double-quotes are silently transformed
to single-quotes (e.g proj had a sdesc of ""The PROJ Cartographic
Projections Software (utilities)", where the erroneous nested
double-quote was being transformed to a single-quote)
There is no escaping of embedded double-quotes, and no way to represent one.
Additionally, spaces after the leading quote are magically removed.
Additionally, genini requires that sdesc and ldesc are double-quoted,
but upset does not.
I'd suggest that double-quoting of those keys is made mandatory, and
embedded double-quotes are forbidden, as this permits simpler processing
of this text, lexing character by character.
* It's not very clear what 'skip' represents
The description "The skip line indicates that that package should not
appear in setup. It is intended for directories that exist in the
hierarchy that should not be considered." is a bit vague to me.
It's not totally clear if it's intended for indicating directories which
should be empty, source-only packages, or something else.
upset knows enough to omit packages which have no install tarfiles (i.e.
are source-only) from from setup.ini, irrespective of 'skip'.
However, the presence of 'skip' also causes the package to be omitted
from the HTML package list.
I think cygport's behaviour has changed over time, but currently will
mark source-packages as 'skip', however there are several packages that
are source-only (e.g. attica), that are missing 'skip'.