Hi Stéphane, thanks for your patch which I applied in the python3 branch. Unfortunately it does not solve the issue:
udd(python3) $ ./update-and-run.sh ddtp Traceback (most recent call last): File "/srv/udd.debian.org/udd//udd.py", line 88, in <module> exec("gatherer.%s()" % command) File "<string>", line 1, in <module> File "/srv/udd.debian.org/udd/udd/ddtp_gatherer.py", line 127, in run h.update(f.read()) File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte Thanks a lot anyway Andreas. On Mon, May 18, 2020 at 01:15:11PM +0200, Stéphane Blondon wrote: > Hello, > > On 15/05/2020 21:10, Andreas Tille wrote:> Would you mind providing a > patch with chardet? > There is a patch attached to this e-mail. > > I used [1] for the base file. I don't think the patch is great (because > there are two 'open()' calls) but it has minimal modifications of the > current source code. I think it's a better solution for the success the > migration to python3 (because it avoid introducing bugs during the > migration). > > > Feel free to ask for more explanations or other stuff if you need. > > 1: https://salsa.debian.org/qa/udd/-/blob/master/udd/ddtp_gatherer.py > > -- > Stéphane > --- ddtp_gatherer.py.orig 2020-05-17 22:54:21.793075000 +0200 > +++ ddtp_gatherer.py 2020-05-18 13:02:47.210764004 +0200 > @@ -25,6 +25,8 @@ > import logging > import logging.handlers > > +import chardet > + > debug=0 > > def get_gatherer(connection, config, source): > @@ -117,7 +119,7 @@ > trfile = trfilepath + file > # check whether hash recorded in index file fits real file > try: > - f = open(trfile) > + f = _open_file(trfile) > except IOError, err: > self.log.error("%s: %s.", str(err), trfile) > continue > @@ -236,6 +238,13 @@ > except IOError, err: > self.log.exception("Error reading %s%s", dir, filename) > > +def _open_file(path): > + with open(path, 'rb') as f: > + raw_content = f.read() > + encoding = chardet.detect(raw_content)["encoding"] > + return open(path, encoding=encoding) > + > + > if __name__ == '__main__': > main() > -- http://fam-tille.de