[Koha-bugs] [Bug 39327] UTF-8 BOM missing from label creator CSV and some UTF-8 output broken

bugzilla-daemon--- via Koha-bugs Mon, 17 Mar 2025 02:57:25 -0700

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=39327


Michał <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Signed Off                  |In Discussion
                 CC|                            |[email protected]
                   |                            |om

--- Comment #4 from Michał <[email protected]> ---
I am actually really unsure about this. BOM is a legacy feature and should not
be used for UTF-8 (and the standard says it's not recommended). If Excel cannot
recognize an UTF-8 file properly without manually ticking it, then sadly it is
a problem of the legacy software, but I don't think we should break the file
format just to cater to that. For example LibreOffice or Google Docs recognizes
these files as UTF-8 properly right away, so does Windows's Notepad (it didn't
use to at some point, but around 6 years ago they made it so, making BOM
non-default for UTF-8).

And on the other hand, people already using these files in existing software
may face breakage, because software that isn't coded to explicitly recognize
BOM at the beginning of an UTF-8 file (and that's uncommon), might in turn
break as well, by having garbage data at the beginning of the file.

The other reason that I see catering to one particular piece of software
(Excel), is that for Excel default settings for CSV import depend on the system
locale for things like the default separator (among other broken behavior).
Indeed in my locale comma-separated CSVs open with all lines in the first
column, instead of being separated, if the user doesn't go through the import
wizard and set the settings anyways. So the file will open differently on
different Windows computers anyways. ALSO on top of that, I found on the
internet that reportedly Excel 2007 ignored UTF-8 BOM anyways (and only
recognizes it since Excel 2013 reportedly), so this isn't even a full solution
to that software either: https://stackoverflow.com/a/40807218/4470653

So I would contest this change and ask that if people really want it to be
there, it should be either as separate export option or some kind of syspref...
(idk "CSV (UTF-8)", "CSV (UTF-8 with BOM)", akin to how most editors indicate
the encodings)

Or if we want to please Excel specifically, there should be some kind of XLSX
option instead perhaps.

> Although in some cases I've noticed UTF-8 data in Koha is being exported 
> seemingly as Latin-1 data.

True! Noticed it too when testing for example exporting a result of report for
patrons with most checkouts to CSV. That part should be fixed for sure.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 39327] UTF-8 BOM missing from label creator CSV and some UTF-8 output broken

Reply via email to