On 25/08/2020 18.37, sebb wrote:
-1
This will likely change the generated ID for emails that already have
archived-at headers
No, it will not change that, as they already will have it. This is
fixing an issue with randomness in the archiving.
See
https://github.com/apache/incubator-ponymail-foal/blob/master/tools/archiver.py#L816
What happens is, an archived-at with *the current timestamp* will get
added to all messages that do not have it, so it's not really helping at
all with anything when you use the .as_bytes, as that data will always
be different unless there already is such a header.
The raw_msg will deliver a much more reliable result going forward, as
it doesn't add "random" data to the mix on each reload (because it is
the original raw data without headers appended on the fly).
I hope this clears matters up.
Far better to ensure the archived-at header is not added to the parsed
mail before the generator is used.
On Tue, 25 Aug 2020 at 17:32, <[email protected]> wrote:
This is an automated email from the ASF dual-hosted git repository.
humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git
The following commit(s) were added to refs/heads/master by this push:
new 6dfb9d8 Use raw_msg instead of as_bytes, as the latter has a new
archived-at appended sometimes (and we don't want that)
6dfb9d8 is described below
commit 6dfb9d83b1fa6e0ae83bc61446a27fe555751f8c
Author: Daniel Gruno <[email protected]>
AuthorDate: Tue Aug 25 18:32:33 2020 +0200
Use raw_msg instead of as_bytes, as the latter has a new archived-at
appended sometimes (and we don't want that)
---
tools/plugins/generators.py | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/plugins/generators.py b/tools/plugins/generators.py
index 949a3b6..344f115 100644
--- a/tools/plugins/generators.py
+++ b/tools/plugins/generators.py
@@ -151,7 +151,7 @@ def dkim(_msg, _body, lid, _attachments, raw_msg):
# Full generator: uses the entire email (including server-dependent data)
# Used by default until August 2020.
# See 'dkim' for recommended generation.
-def full(msg, _body, lid, _attachments, _raw_msg):
+def full(_msg, _body, lid, _attachments, raw_msg):
"""
Full generator: uses the entire email
(including server-dependent data)
@@ -159,15 +159,15 @@ def full(msg, _body, lid, _attachments, _raw_msg):
but different copies of the message are likely to have different headers,
thus ids
Parameters:
- msg - the parsed message
+ msg - the parsed message (not used)
_body - the parsed text content (not used)
lid - list id
_attachments - list of attachments (not used)
- _raw_msg - the original message bytes (not used)
+ raw_msg - the original message bytes
Returns: "<hash>@<lid>" where hash is sha224 of message bytes
"""
- mid = "%s@%s" % (hashlib.sha224(msg.as_bytes()).hexdigest(), lid)
+ mid = "%s@%s" % (hashlib.sha224(raw_msg).hexdigest(), lid)
return mid