Le 15/07/2021 à 22:30, Nyall Dawson a écrit :


On Fri, 16 Jul 2021, 4:52 am Andrea Giudiceandrea, <andreaer...@libero.it <mailto:andreaer...@libero.it>> wrote:

    > Isn't this limitation ultimately that GDAL isn't reading the
    encoding
    > correctly? (Or perhaps it's a limitation in the underlying freexl
    > library...)
    >
    > Nyall


    Hi Nyall and Andreas,
    it seems to me GDAL/OGR [1] reads XLS and XLSX files with the
    relative
    proper encoding [2] and ogrinfo outputs the text in UTF-8 for both
    the
    formats.


I don't think that's completely correct -- looking at the freexl documentation it seems that only some xls file versions are utf8, and others have a codepage indicating the encoding which needs to be read from the xls metadata:

"Any BIFF version from BIFF2 to BIFF5 simply supports CodePage based character encoding, i.e. each character simply requires 8 bits to be represented (single byte). Correct representation of characters requires knowing which one CodePage table has to be applied. This can be determined from the workbook or worksheet metadata (it is the
CODEPAGE record).
BIFF8 is much more sophisticated, since any text string is usually encoded as Unicode in UTF-16 Little Endian [UTF-16LE] format. This encoding is a multi-byte encoding (two bytes are required to represent a single character),
but being universal no character table is required."
Yes, but FreeXL does the recoding to UTF-8

Nyall




    Instead, QGIS imports correctly XLSX files as UTF-8 encoded, while
    XLS
    files are wrongly imported as "system" encoded, even selecting
    UTF-8 [3]
    encoding in the Data Source Manager vector import window.

    After importing a XLS file, changing the "Data source encoding" of
    the
    layer to "UTF-8" fixes the text codecs in my tests.

    So, I think QGIS should automatically import also XLS files as UTF-8
    encoded.

    Best regards.

    Andrea Giudiceandrea

    [1] tested on Windows /OSGeo4W: GDAL 3.1.4 / FreeXL 1.0.2 / Expat
    2.1.0
    and GDAL 3.2.2 / FreeXL 1.0.6 / Expat 2.2.10
    [2] text in XLS (BIFF8) files are internally encoded in UTF-16LE
    [3] by the way, there are incorrectly two "UTF-8" codecs listed in
    the
    "Encoding" drop down menu list...
    _______________________________________________
    QGIS-Developer mailing list
    QGIS-Developer@lists.osgeo.org <mailto:QGIS-Developer@lists.osgeo.org>
    List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
    <https://lists.osgeo.org/mailman/listinfo/qgis-developer>
    Unsubscribe:
    https://lists.osgeo.org/mailman/listinfo/qgis-developer
    <https://lists.osgeo.org/mailman/listinfo/qgis-developer>


_______________________________________________
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

--
http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

Reply via email to