On 10/12/23 at 12:10 +1100, Stuart Prescott wrote:
> Package: qa.debian.org
> Severity: normal
> X-Debbugs-Cc: stu...@debian.org
> 
> The 'maintainer' and 'maintainer_email' columns of the upload_history table
> in UDD have truncated email addresses. Somewhere the 'maintainer' data
> is being truncated and then the maintainer_email is consequently broken.
> 
> udd=> SELECT maintainer, maintainer_email FROM upload_history WHERE 
> maintainer_email LIKE '%=' LIMIT 10;
>                            maintainer                           |             
>   maintainer_email
> ----------------------------------------------------------------+----------------------------------------------
>  Maintainers of GStreamer packages <pkg-gstreamer-maintainers@= | 
> pkg-gstreamer-maintainers@=
>  Maintainers of GStreamer packages <pkg-gstreamer-maintainers@= | 
> pkg-gstreamer-maintainers@=
>  Zenoss Packaging Team <pkg-zenoss-t...@lists.alioth.debian.or= | 
> pkg-zenoss-t...@lists.alioth.debian.or=
>  Debian GNOME Maintainers <pkg-gnome-maintainers@lists.alioth.= | 
> pkg-gnome-maintainers@lists.alioth.=
>  Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.o= | 
> pkg-perl-maintainers@lists.alioth.debian.o=
>  Debian VoIP Team <pkg-voip-maintain...@lists.alioth.debian.or= | 
> pkg-voip-maintain...@lists.alioth.debian.or=
>  Debian Python Modules Team <python-modules-team@lists.alioth.= | 
> python-modules-team@lists.alioth.=
>  Debian Python Modules Team <python-modules-team@lists.alioth.= | 
> python-modules-team@lists.alioth.=
>  Debian Firebird Group <pkg-firebird-gene...@lists.alioth.debi= | 
> pkg-firebird-gene...@lists.alioth.debi=
>  Debian Samba Maintainers <pkg-samba-maint@lists.alioth.debian= | 
> pkg-samba-maint@lists.alioth.debian=
> (10 rows)
> 
> 
> The input data from the d-d-c mailing list looks fine in the web archive,
> but I can imagine this being due to linewrappig in the mbox files.
> 
> Looking at one specific example:
> 
> https://lists.debian.org/debian-devel-changes/2007/12/msg00466.html
> 
> udd=> SELECT maintainer, maintainer_email FROM upload_history WHERE 
> maintainer_email LIKE '%=' AND source = 'libxml-rss-perl' AND version = 
> '1.31-3';
> maintainer                           |              maintainer_email
> ----------------------------------------------------------------+---------------------------------------------
> Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.o= | 
> pkg-perl-maintainers@lists.alioth.debian.o=
> (1 row)
> 
> This particular example is quite old but the problem also exists in
> recent uploads; as of writing the most recent one is libgetdata (0.11.0-9)
> that was uploaded today.
> 
> Of the 850k rows in upload_history, this data issue is in 70k of them.

Hi,

I did some changes to the email decoding that solved most cases. We are
down to 1162 badly processed emails (from the 70k you reported):

udd=> SELECT count(*) FROM upload_history WHERE maintainer_email LIKE '%=';
 count
-------
  1162

They are all since 2022-08-27, which coincides with dak adding a
detached signature. So there might still be something to fix in the code
for that case.

udd=> select source, version, date from upload_history where maintainer_email 
LIKE '%='  order by date asc limit 10;
           source           |    version    |          date          
----------------------------+---------------+------------------------
 libsweble-common-java      | 3.0.8-3       | 2022-08-27 20:49:34+00
 xeus                       | 2.4.0-2       | 2022-08-27 20:49:43+00
 systemd                    | 251.4-3       | 2022-08-27 22:05:51+00
 cross-toolchain-base-ports | 53            | 2022-08-28 10:04:10+00
 opencascade                | 7.6.3+dfsg1-3 | 2022-08-28 10:36:28+00
 wvkbd                      | 0.10-1        | 2022-08-28 10:36:40+00
 gobject-introspection      | 1.73.0+ds-1   | 2022-08-28 10:49:10+00
 yade                       | 2022.01a-11   | 2022-08-28 11:05:40+00
 ruby-em-http-request       | 1.1.7-1       | 2022-08-28 12:29:29+00
 ruby-rails-i18n            | 7.0.5-1       | 2022-08-28 14:51:31+00
(10 rows)

Lucas

Reply via email to