Please run locale from a terminal and copy/paste the result in a
comment.

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to duplicity in Ubuntu.
https://bugs.launchpad.net/bugs/1893481

Title:
  UnicodeEncodeError when logging improperly encoded filenames

Status in Duplicity:
  Confirmed
Status in duplicity package in Ubuntu:
  New

Bug description:
  Attempts to log messages which contain unicode surrogate characters cause 
exceptions.
  (These surrogate characters arise, for example, when handling files whose 
names are not properly encoded as UTF-8.)

  NOTE: I have no idea whether this is an issue when running on python
  2.  (If it is, the fixes suggested below probably won't work.)

  
  Duplicity version: 0.8.15
  Python version: 3.8.5
  Target filesystem: Linux

  Example log output:

  --- Logging error ---
  Traceback (most recent call last):
    File "/opt/Python-3.8.5/lib/python3.8/logging/__init__.py", line 1084, in 
emit
      stream.write(msg + self.terminator)
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 
45: surrogates not allowed
  Call stack:
    File "/root/.local/pipx/venvs/duplicity/bin/duplicity", line 104, in 
<module>
      with_tempdir(main)
    File "/root/.local/pipx/venvs/duplicity/bin/duplicity", line 90, in 
with_tempdir
      fn()
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
 line 1531, in main
      do_backup(action)
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
 line 1655, in do_backup
      full_backup(col_stats)
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
 line 559, in full_backup
      bytes_written = write_multivol(u"full", tarblock_iter,
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/dup_main.py",
 line 417, in write_multivol
      at_end = gpg.GPGWriteFile(tarblock_iter, tdp.name, config.gpg_profile,
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/gpg.py",
 line 390, in GPGWriteFile
      data = block_iter.__next__().data
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py",
 line 544, in __next__
      result = self.process(next(self.input_iter))  # pylint: 
disable=assignment-from-no-return
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py",
 line 238, in get_delta_iter
      log_delta_path(delta_path, new_path, stats)
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/diffdir.py",
 line 181, in log_delta_path
      log.Info(_(u"A %s") %
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/log.py",
 line 128, in Info
      Log(s, INFO, code, extra)
    File 
"/root/.local/pipx/venvs/duplicity/lib/python3.8/site-packages/duplicity/log.py",
 line 91, in Log
      _logger.log(DupToLoggerLevel(verb_level), s,
  Message: 'A home/dairiki/PRCS/junk-changelog/22_Senaste\udcc4nd,v'
  Arguments: ()


  Steps to reproduce:
  - Have a file with funny characters in its name, encoded in latin-1 encoding. 
 E.g. a file whose name is "Fü" encoded to latin-1 (b'F\xfc').  When duplicity 
handles this file, the improperly encoded character will be replaced with a 
unicode surrogate character.
  - Attempt to create an archive containing this file, with verbosity set to 5. 
 Duplicity will try to log each file processed.  When it gets to this file, an 
exception will be reported (and the file will not make it into the archive.)

  Alternative steps to produce:
  - If the archive is created with verbosity less than 5, the file will make it 
into the archive.  However, if an attempt is made to list files using 
'duplicity list-current-files', an exception will be reported when it gets to 
the file with the funny name.

  
  Workaround
  ==========

  A simple workaround is to set the environment variable
  PYTHONIOENCODING="utf-8:surrogateescape" before running duplicity.
  This will set the encoding error mode for stdout and stderr to
  'surrogateescape' (by default it is 'strict') with the effect that any
  surrogates will be replaced with the unicode replacement character
  (U+FFFD: "�").

  
  Possible Fix
  ============

  A possible fix, at least for Py3K, is probably for duplicity to explicitly 
set the encoding error strategy for stdin and stdout.
  For python >= 3.7 this is simple:

      sys.stdin.reconfigure(errors='surrogateescape')
      sys.stderr.reconfigure(errors='surrogateescape')

  For earlier pythons (>= 3), the best option might be:

      sys.stdin = codecs.getwriter('utf-8')(sys.stdin.detach(),
  'surrogateescape')

  (and similarly for stderr)

  Note that python 2 doesn't know about errors='surrogateescape'.
  Errors='replace' would probably work as an alternative, but it's not
  ideal as it replaces the surrogates with a plain question mark rather
  than a unicode replacement character.

  
  Possible Similar Issue
  ======================

  I didn't actually verify that this fails, but it appears that there
  might be a similar issue when using the --log-fd command line option.
  Function duplicity.log.add_fd() does a:

      handler = logging.StreamHandler(os.fdopen(fd, u'w'))

  In Python 3 os.fdopen (an alias for open) opens the stream with
  errors='strict' by default.

      handler = logging.StreamHandler(os.fdopen(fd, u'w',
  errors='surrogateescape'))

  or

      handler = logging.StreamHandler(open(fd, u'w',
  errors='surrogateescape'))

  is probably a better choice.  (But neither will work in python 2.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/duplicity/+bug/1893481/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to