https://bugs.documentfoundation.org/show_bug.cgi?id=150714
Bug ID: 150714 Summary: DATALOSS: saving a recovered CSV converts all non-Western characters to question marks Product: LibreOffice Version: unspecified Hardware: All URL: https://forumooo.ru/index.php?topic=9330 OS: Windows (All) Status: UNCONFIRMED Severity: normal Priority: medium Component: Calc Assignee: libreoffice-bugs@lists.freedesktop.org Reporter: mikekagan...@hotmail.com Created attachment 182109 --> https://bugs.documentfoundation.org/attachment.cgi?id=182109&action=edit An UTF-8-encoded CSV If autorecovery information save is enabled, and a crash happens while editing a CSV, then opening LibreOffice would offer recovery for the CSV. Performing the recovery would give the correct data, and saving the recovered data using File->Save would overwrite the original CSV, but the result would loose all the non-Western characters, which would only be apparent after reload, at which point, the non-Western data would be unrecoverable (the autorecovery information is deleted, and original CSV is overwritten). Steps to reproduce: 0. Make sure that "Save Autorecovery information" is enabled under Options->Load/Save->General, and set to some small value (1 minute) for ease of reproduction. 1. Open the attached CSV, making sure to use UTF-8 encoding (it contains a string "テストabcабвÀ", which includes Japanese, English, Cyrillic, and extended Western characters). 2. Make some change to B2, e.g. replace "1" with "2". 3. Wait for automatic save. 4. Kill soffice.bin process. 5. Start LibreOffice, see it offers recovery for the CSV. Confirm the recovery. Note that no CSV import dialog is shown during the recovery. 6. See that the recovered document looks OK, having the correct string in A2, and "2" in B2. 7. Press Save toolbar button (or File->Save), confirm saving as CSV. 8. Close and reopen the file. The text in A2 will be destroyed - both Japanese and Cyrillic characters would turn to question marks. The Western character "À" will be restored if ISO 8859-1 encoding is selected on import. Note that if in step #7, you use Save As instead, and select "Edit filter settings" in the Save As dialog, the settings dialog would offer the last used encoding, not ISO 8859-1. When saving a new document to CSV, the filter settings dialog would appear even when "Edit filter settings" is unselected, so it looks like a problem specific to recovered files. The origin of the problem seems to be in the fact that autorecovery stores files in the native ODS format (which is a good thing), but that doesn't keep the original filter settings in the autorecovery data (see also bug 123877 comment 9). So opening the autorecovery ODS can't provide Calc with the original encoding, and some internal default is used, which happens to be ISO 8859-1, not even the last-used encoding that is stored in the profile (using the last-used setting silently would also be incorrect anyway). The proposal is to make UTF-8 the default encoding for recovered CSV files. Another option could be to make autorecovered documents have some media flag set that would force filter settings dialog on save, the same way as when you save a new document to CSV. Yet another option is to implement storing original CSV filter settings inside autorecovery ODS (most complex). The issue appeared in https://forumooo.ru/index.php?topic=9330. -- You are receiving this mail because: You are the assignee for the bug.