On Monday, 6 March 2017 22.27.25 WET Enrico Forestieri wrote:
> The problem is that the file starts with a U+FEFF BOM mark, and most
> probably lyx2lyx doesn't expect to deal with BOM marks. Actually, the
> conversion succeeds, but the BOM mark is left in place (it now appears
> on the second line, though). Consequently, lyx then chokes on reading it.
> It suffices removing the bogus second line of the converted file to make
> it perfectly readable by lyx 2.2.
> LyX 2.1 is able to read it because no conversion is necessary and the BOM
> mark is skipped because it is dealt with by the lexer when it is at the
> beginning of the file.
Thank you for noticing that. :-)
The next question is how it got there. I assume of course that somewhere
windows was involved. :-)
Is this a valid process that we should support?
If so the following patch fixes this for me, for both python 2 and 3.
--
José Abílio
diff --git a/lib/lyx2lyx/LyX.py b/lib/lyx2lyx/LyX.py
index 77ccdd0..b70fa3f 100644
--- a/lib/lyx2lyx/LyX.py
+++ b/lib/lyx2lyx/LyX.py
@@ -29,6 +29,7 @@ import sys
import re
import time
import io
+import codecs
try:
import lyx2lyx_version
@@ -525,6 +526,11 @@ class LyX_base:
initial_comment = " ".join(["#LyX %s created this file." % version__,
"For more info see http://www.lyx.org/"])
+ # Remove UTF8 BOM marker if present
+ text = unicode if PY2 else str
+ if text(self.header[0]).encode("utf-8").startswith(codecs.BOM_UTF8):
+ self.header[0] = self.header[0][1:]
+
# Simple heuristic to determine the comment that always starts
# a lyx file
if self.header[0].startswith("#"):
@@ -547,7 +553,7 @@ class LyX_base:
if PY2:
result = fileformat.match(line)
else:
- result = fileformat.match(line.decode('ascii'))
+ result = fileformat.match(line.decode('ascii','ignore'))
if result:
return self.lyxformat(result.group(1))
else: