Le Fri, Feb 10, 2012 at 10:05:55AM -0800, Russ Allbery a écrit : > Jakub Wilk <jw...@debian.org> writes: > > * Russ Allbery <r...@debian.org>, 2012-02-09, 23:05: > > >> Note that another case that I don't think has been discussed, but which > >> is probably more common than embedded quote marks, is a filename that's > >> invalid UTF-8 (straight ISO 8859-1, for example). That's also not > >> representable in our typical debian/copyright file, > > > The specification currently reads: “Only the wildcards * and ? apply; the > > former matches any number of characters (including none), the latter a > > single character.” > > > But characters of which encoding? If UTF-8, then for some filenames, no > > wildcard exist that would match them. > > Indeed. That's arguably a worse hole in the specification than whitespace > handling, since it may not be possible to use wildcards to work around it. > I'm not sure if we need to say something about that explicitly, or if it's > rare enough that we don't have to care.
Dear all, how about documenting these facts in the DEP and going ahead with the current syntax ? + <section id="limitations"> + <title>Limitations</title> + <para> + The pattern syntax can not distinguish files whose names differ only by + whitespaces, nor files that have the same name but are in paths that only + differ by whitespaces. + </para> + <para> + It is not possible to represent a file name or a path using an encoding + that is not compatible with Unicode. + </para> + </section> For the white spaces, it has been a year that we claim that we will not make normative changes unless necessary, and the possibilities discussed are all theoretical. I think that extensions are welcome for next versions of the format, but the possibility to break existing files with a normative change is not less unlikely than the possibility to encounter a package where two files have different licenses and names that differ only by whitespaces, and where the upstream author would either refuse or not be available to correct that problem. For the encoding, this is not a problem limited to the machine-readable format. If the Debian copyright file is in an encoding A, and one file has a name or is in a directory that has a name in an encoding B that can not be represented in A, and that there is no way to escape this problem with wildcards, that the file or directory can not be described by its name regardless of the syntax followed by the copyright file. It is good to care about these cases, and I propose to do so by documenting them the version 1.0 and keeping bugs open, that may be solved in a future version if there is a solution that satisfies both the developers who write the files and the developers who write the parsers. Have a nice week-end, -- Charles Plessy Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120211030650.gf19...@falafel.plessy.net