Le 15/07/2021 à 22:30, Nyall Dawson a écrit :
On Fri, 16 Jul 2021, 4:52 am Andrea Giudiceandrea,
<andreaer...@libero.it <mailto:andreaer...@libero.it>> wrote:
> Isn't this limitation ultimately that GDAL isn't reading the
encoding
> correctly? (Or perhaps it's a limitation in the underlying freexl
> library...)
>
> Nyall
Hi Nyall and Andreas,
it seems to me GDAL/OGR [1] reads XLS and XLSX files with the
relative
proper encoding [2] and ogrinfo outputs the text in UTF-8 for both
the
formats.
I don't think that's completely correct -- looking at the freexl
documentation it seems that only some xls file versions are utf8, and
others have a codepage indicating the encoding which needs to be read
from the xls metadata:
"Any BIFF version from BIFF2 to BIFF5 simply supports CodePage based
character encoding, i.e. each character
simply requires 8 bits to be represented (single byte). Correct
representation of characters requires knowing which
one CodePage table has to be applied. This can be determined from the
workbook or worksheet metadata (it is the
CODEPAGE record).
BIFF8 is much more sophisticated, since any text string is usually
encoded as Unicode in UTF-16 Little Endian
[UTF-16LE] format. This encoding is a multi-byte encoding (two bytes
are required to represent a single character),
but being universal no character table is required."
Yes, but FreeXL does the recoding to UTF-8
Nyall
Instead, QGIS imports correctly XLSX files as UTF-8 encoded, while
XLS
files are wrongly imported as "system" encoded, even selecting
UTF-8 [3]
encoding in the Data Source Manager vector import window.
After importing a XLS file, changing the "Data source encoding" of
the
layer to "UTF-8" fixes the text codecs in my tests.
So, I think QGIS should automatically import also XLS files as UTF-8
encoded.
Best regards.
Andrea Giudiceandrea
[1] tested on Windows /OSGeo4W: GDAL 3.1.4 / FreeXL 1.0.2 / Expat
2.1.0
and GDAL 3.2.2 / FreeXL 1.0.6 / Expat 2.2.10
[2] text in XLS (BIFF8) files are internally encoded in UTF-16LE
[3] by the way, there are incorrectly two "UTF-8" codecs listed in
the
"Encoding" drop down menu list...
_______________________________________________
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org <mailto:QGIS-Developer@lists.osgeo.org>
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
<https://lists.osgeo.org/mailman/listinfo/qgis-developer>
Unsubscribe:
https://lists.osgeo.org/mailman/listinfo/qgis-developer
<https://lists.osgeo.org/mailman/listinfo/qgis-developer>
_______________________________________________
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer