The doc only describes a technique, whereas it could mention some use cases and give more info to developers. Thus it doesn't seem to me to be ready. I propose some thoughts that may lead to additional info for marf-redaction, if the WG deems worth to expand on them:
*Redacting the header* Section 2 bullet 2 mentions "local-parts of email addresses", which is fine. However, the spec puts it within parentheses and prefixed by "such that", thereby allowing unwanted mangling. Some email addresses probably should never be redacted, e.g. From: and Return-Path:, except for VERP. Recipients addresses may appear in To:, Cc:, the "for" clause of Received:, Delivered-To:, Envelope-To:, X-Envelope-To:, X-Rcpt-To:, X-Original-To:, and the like. Would it make sense to attempt a possibly comprehensive list, as a guide to developers? In addition, any of these fields that was added locally could as well be removed altogether. By redacting To: or Cc: a reporter most likely breaks any DKIM signature. That might prevent the report from being accepted, so it should be mentioned. Workarounds are probably different for FBLs than for reports submitted by general public. *Redacting the body* I've seen some mailing list software obscuring email addresses in the body, but found no guidance about this. People send passwords and credit card numbers via email, and there is no standard to annotate that these are sensible strings --a lost cause. At any rate, when the body is covered by a signature, the same concern as above arises. *What to redact* The reporting-discovery draft has a "Privacy considerations" section saying that messages containing sensible data must not be reported as spam. Messages reported as spam are considered public. (That section might be moved to a BCP) This concept may be a means to convey that the already-abused recipient addresses are the only piece of data that deserve redaction. Jacqui Caren recommends "redaction of all identifying marks when dealing with a spamtrap of obvious spam", in a scenario where redaction is "based upon the anti-spam score the orginal message gets and what level of trust you place in the MSP". http://www.ietf.org/mail-archive/web/marf/current/msg01048.html Probably two or more honeypots are required to identify by difference what parts of the messages contain varying information that can potentially betray the destination address. OTOH, users can inadvertently report as spam legitimate messages that contain other kinds of sensible data. Asking "Are you sure?" in the GUI won't probably help much. *Where is the pristine message*? It has been mentioned that reports may need to be transmitted to LEAs. In some cases, a judge may order the disclosure of redacted data. Can we use Message-Id: or similar field to locate it? If we can, we'd better suggest to never redact that field. If transmission to LEAs crosses jurisdictional boundaries, it may be useful to tell which country/state is the pristine data located in. -- _______________________________________________ marf mailing list [email protected] https://www.ietf.org/mailman/listinfo/marf
