https://bugs.documentfoundation.org/show_bug.cgi?id=154229

--- Comment #10 from Mike Kaganski <mikekagan...@hotmail.com> ---
The false detection happens in WP42Heuristics::isWP42FileFormat

https://sourceforge.net/p/libwpd/code/ci/master/tree/src/lib/WP42Heuristics.cpp#l61

and it seems to do the reasonable job - just the data happens to be
suspiciously similar to the proper format. It is a UTF-8-encoded plain text
file, which has ASCII characters (0x20 to 0x7F), and exactly one pair of the
non-ASCII characters (the same “), which are encoded in UTF-8 as 0xE2 0x80.
This pair forms one "variable-length functional group" (starting from 0xE2, and
ending at the same 0xE2), and immediately after, one "single-character
functional group", consisting of 0x80. Such a unlikely coincidence: the
properties of 0xE2 in WordPerfect require the pair; and the properties of 0x80
make it a valid alone. If it were almost anything different; it the quotes were
different, like “...”; or if there was at least one other non-ASCII; or...

I don't know if the detection can be improved. But the constellation is funny
;)

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to