Niels Thykier wrote: > Digging a bit deeper, it turns out that `file -i` correctly classifies > the changelog as `text/plain; charset=utf-8`. That is, `file` knows it > is text and I suspect `diffoscope` should try `file -i` as well when it > gets an unknown result from `file`.
By "unknown result" I assume you mean that diffoscope cannot match the file type with any known comparator. :) Indeed, diffoscope doesn't recognise the bogus "Message Sequence Chart" so it falls back to using a hexdump as you intuited. I've got some WIP code that will treat unknown file types as text if they have a MIME type of text/plain. This avoids the use of hexdump with the examples you sent over at least. Do you think I should be further limiting that conditional to a whitelist of safe encodings, too? (eg. "utf-8" and "us-ascii", etc.) Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org 🍥 chris-lamb.co.uk `-