Please run locale from a terminal and copy/paste the result in a comment. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to duplicity in Ubuntu. https://bugs.launchpad.net/bugs/1893481
Title: UnicodeEncodeError when logging improperly encoded filenames Status in Duplicity: Confirmed Status in duplicity package in Ubuntu: New Bug description: Attempts to log messages which contain unicode surrogate characters cause exceptions. (These surrogate characters arise, for example, when handling files whose names are not properly encoded as UTF-8.) NOTE: I have no idea whether this is an issue when running on python 2. (If it is, the fixes suggested below probably won't work.) Duplicity version: 0.8.15 Python version: 3.8.5 Target filesystem: Linux Example log output: --- Logging error --- Traceback (most recent call last): File "/opt/Python-3.8.5/lib/python3.8/logging/__init__.py", line 1084, in emit stream.write(msg + self.terminator) UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 45: surrogates not allowed Call stack: File "/root/.local/pipx/venvs/duplicity/bin/duplicity", line 104, in <module> with_tempdir(main) File "/root/.local/pipx/venvs/duplicity/bin/duplicity", line 90, in with_tempdir fn() File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py", line 1531, in main do_backup(action) File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py", line 1655, in do_backup full_backup(col_stats) File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py", line 559, in full_backup bytes_written = write_multivol(u"full", tarblock_iter, File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py", line 417, in write_multivol at_end = gpg.GPGWriteFile(tarblock_iter, tdp.name, config.gpg_profile, File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/gpg.py", line 390, in GPGWriteFile data = block_iter.__next__().data File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py", line 544, in __next__ result = self.process(next(self.input_iter)) # pylint: disable=assignment-from-no-return File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py", line 238, in get_delta_iter log_delta_path(delta_path, new_path, stats) File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py", line 181, in log_delta_path log.Info(_(u"A %s") % File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/log.py", line 128, in Info Log(s, INFO, code, extra) File "/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/log.py", line 91, in Log _logger.log(DupToLoggerLevel(verb_level), s, Message: 'A home/dairiki/PRCS/junk-changelog/22_Senaste\udcc4nd,v' Arguments: () Steps to reproduce: - Have a file with funny characters in its name, encoded in latin-1 encoding. E.g. a file whose name is "Fü" encoded to latin-1 (b'F\xfc'). When duplicity handles this file, the improperly encoded character will be replaced with a unicode surrogate character. - Attempt to create an archive containing this file, with verbosity set to 5. Duplicity will try to log each file processed. When it gets to this file, an exception will be reported (and the file will not make it into the archive.) Alternative steps to produce: - If the archive is created with verbosity less than 5, the file will make it into the archive. However, if an attempt is made to list files using 'duplicity list-current-files', an exception will be reported when it gets to the file with the funny name. Workaround ========== A simple workaround is to set the environment variable PYTHONIOENCODING="utf-8:surrogateescape" before running duplicity. This will set the encoding error mode for stdout and stderr to 'surrogateescape' (by default it is 'strict') with the effect that any surrogates will be replaced with the unicode replacement character (U+FFFD: "�"). Possible Fix ============ A possible fix, at least for Py3K, is probably for duplicity to explicitly set the encoding error strategy for stdin and stdout. For python >= 3.7 this is simple: sys.stdin.reconfigure(errors='surrogateescape') sys.stderr.reconfigure(errors='surrogateescape') For earlier pythons (>= 3), the best option might be: sys.stdin = codecs.getwriter('utf-8')(sys.stdin.detach(), 'surrogateescape') (and similarly for stderr) Note that python 2 doesn't know about errors='surrogateescape'. Errors='replace' would probably work as an alternative, but it's not ideal as it replaces the surrogates with a plain question mark rather than a unicode replacement character. Possible Similar Issue ====================== I didn't actually verify that this fails, but it appears that there might be a similar issue when using the --log-fd command line option. Function duplicity.log.add_fd() does a: handler = logging.StreamHandler(os.fdopen(fd, u'w')) In Python 3 os.fdopen (an alias for open) opens the stream with errors='strict' by default. handler = logging.StreamHandler(os.fdopen(fd, u'w', errors='surrogateescape')) or handler = logging.StreamHandler(open(fd, u'w', errors='surrogateescape')) is probably a better choice. (But neither will work in python 2.) To manage notifications about this bug go to: https://bugs.launchpad.net/duplicity/+bug/1893481/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp