Niels Thykier wrote:

> Digging a bit deeper, it turns out that `file -i` correctly classifies 
> the changelog as `text/plain; charset=utf-8`.  That is, `file` knows it 
> is text and I suspect `diffoscope` should try `file -i` as well when it 
> gets an unknown result from `file`.

By "unknown result" I assume you mean that diffoscope cannot match
the file type with any known comparator. :)  Indeed, diffoscope
doesn't recognise the bogus "Message Sequence Chart" so it falls
back to using a hexdump as you intuited.

I've got some WIP code that will treat unknown file types as text if
they have a MIME type of text/plain. This avoids the use of hexdump
with the examples you sent over at least.

Do you think I should be further limiting that conditional to a
whitelist of safe encodings, too? (eg. "utf-8" and "us-ascii", etc.)


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      la...@debian.org 🍥 chris-lamb.co.uk
       `-

_______________________________________________
Reproducible-builds mailing list
Reproducible-builds@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds

Reply via email to