Hi Daniel, > While I'm at it, it occurred to me that for both the ozlabs and > kernel.org instances, there are a lot of mails that are sent across > multiple projects. ATM the entire contents of the mail - content, > headers, diff, what have you, will be stored in full for each project.
The headers will be different, as they've gone through different lists. This may not be too relevant to the actual purpose of patchwork though. The comments (apart from the first) may diverge, depending on whether responders keep both lists on CC. The diffs will be the same, so we could deduplicate those, if it's worth your trouble: patchwork=# select sum(dup_size) from (select octet_length(diff) * (n-1) as dup_size, a.msgid, n from (select msgid, count(msgid) as n, min(id) as id from patchwork_submission group by msgid having count(msgid) > 1) as a inner join patchwork_patch on patchwork_patch.submission_ptr_id = a.id) as b; sum ----------- 221334709 (1 row) and: patchwork=# select sum(octet_length(diff)) from patchwork_patch; sum ------------ 6261083055 (1 row) So 221MB out of 6.2GB is duplicate; around 3.5%. Cheers, Jeremy _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork