Hi Daniel,

> While I'm at it, it occurred to me that for both the ozlabs and
> kernel.org instances, there are a lot of mails that are sent across
> multiple projects. ATM the entire contents of the mail - content,
> headers, diff, what have you, will be stored in full for each project.

The headers will be different, as they've gone through different lists.
This may not be too relevant to the actual purpose of patchwork though.

The comments (apart from the first) may diverge, depending on whether
responders keep both lists on CC.

The diffs will be the same, so we could deduplicate those, if it's worth
your trouble:

   patchwork=# select sum(dup_size) from (select octet_length(diff) *
   (n-1) as dup_size, a.msgid, n from (select msgid, count(msgid) as n,
   min(id) as id from patchwork_submission group by msgid having
   count(msgid) > 1) as a inner join patchwork_patch on
   patchwork_patch.submission_ptr_id = a.id) as b;
       sum    
   -----------
    221334709
   (1 row)

and:

   patchwork=# select sum(octet_length(diff)) from patchwork_patch;
       sum     
   ------------
    6261083055
   (1 row)


So 221MB out of 6.2GB is duplicate; around 3.5%.

Cheers,


Jeremy

_______________________________________________
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Reply via email to