[Libreoffice-bugs] [Bug 150714] New: DATALOSS: saving a recovered CSV converts all non-Western characters to question marks

bugzilla-daemon Wed, 31 Aug 2022 00:47:48 -0700

https://bugs.documentfoundation.org/show_bug.cgi?id=150714


            Bug ID: 150714
           Summary: DATALOSS: saving a recovered CSV converts all
                    non-Western characters to question marks
           Product: LibreOffice
           Version: unspecified
          Hardware: All
               URL: https://forumooo.ru/index.php?topic=9330
                OS: Windows (All)
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Calc
          Assignee: libreoffice-bugs@lists.freedesktop.org
          Reporter: mikekagan...@hotmail.com

Created attachment 182109
  --> https://bugs.documentfoundation.org/attachment.cgi?id=182109&action=edit
An UTF-8-encoded CSV

If autorecovery information save is enabled, and a crash happens while editing
a CSV, then opening LibreOffice would offer recovery for the CSV. Performing
the recovery would give the correct data, and saving the recovered data using
File->Save would overwrite the original CSV, but the result would loose all the
non-Western characters, which would only be apparent after reload, at which
point, the non-Western data would be unrecoverable (the autorecovery
information is deleted, and original CSV is overwritten).

Steps to reproduce:
0. Make sure that "Save Autorecovery information" is enabled under
Options->Load/Save->General, and set to some small value (1 minute) for ease of
reproduction.
1. Open the attached CSV, making sure to use UTF-8 encoding (it contains a
string "テストabcабвÀ", which includes Japanese, English, Cyrillic, and extended
Western characters).
2. Make some change to B2, e.g. replace "1" with "2".
3. Wait for automatic save.
4. Kill soffice.bin process.
5. Start LibreOffice, see it offers recovery for the CSV. Confirm the recovery.
Note that no CSV import dialog is shown during the recovery.
6. See that the recovered document looks OK, having the correct string in A2,
and "2" in B2.
7. Press Save toolbar button (or File->Save), confirm saving as CSV.
8. Close and reopen the file.

The text in A2 will be destroyed - both Japanese and Cyrillic characters would
turn to question marks. The Western character "À" will be restored if ISO
8859-1 encoding is selected on import.

Note that if in step #7, you use Save As instead, and select "Edit filter
settings" in the Save As dialog, the settings dialog would offer the last used
encoding, not ISO 8859-1. When saving a new document to CSV, the filter
settings dialog would appear even when "Edit filter settings" is unselected, so
it looks like a problem specific to recovered files.

The origin of the problem seems to be in the fact that autorecovery stores
files in the native ODS format (which is a good thing), but that doesn't keep
the original filter settings in the autorecovery data (see also bug 123877
comment 9). So opening the autorecovery ODS can't provide Calc with the
original encoding, and some internal default is used, which happens to be ISO
8859-1, not even the last-used encoding that is stored in the profile (using
the last-used setting silently would also be incorrect anyway).

The proposal is to make UTF-8 the default encoding for recovered CSV files.
Another option could be to make autorecovered documents have some media flag
set that would force filter settings dialog on save, the same way as when you
save a new document to CSV.
Yet another option is to implement storing original CSV filter settings inside
autorecovery ODS (most complex).

The issue appeared in https://forumooo.ru/index.php?topic=9330.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 150714] New: DATALOSS: saving a recovered CSV converts all non-Western characters to question marks

Reply via email to