Re: The word on messages w/ no Message-Id
Am 29.09.2015 um 23:45 schrieb coolhandluke: based on just what i've found in the last 10 minutes, i would be very careful about scoring anything related to {invalid|missing|extra} headers too high. definitely test your rules extensively (with very low scores) before rolling them out to production! you need to train your bayes at the same time and raise the neagtive score of BAYES_00-BAYES_20 - a spamfilter is always a adaptive system here the combination if missing MID/DATE with a good bayes is never a problem and in combiantion with a high spam-byes core a clear sign for mails to reject (yes all spamass-milter rejects are reviewed careful) signature.asc Description: OpenPGP digital signature
Re: The word on messages w/ no Message-Id
On 2015-09-28 14:32, Joe Quinn wrote: If you don't want to be getting those emails, they are spam and you should score it something reasonable that doesn't prevent you getting other desired messages. While I don't have any specific examples of ham without Message-ID, it's not a stretch to imagine they exist. I personally wouldn't write that rule. out of curiosity, i decided to grep my inbox for e-mails without any message-id: header. i found 37 e-mails without a message-id out of a total of 1144. 29 - domain {renewal|transfer}-related e-mails all from godaddy 5 - spam 2 - receipts from apple retail stores 1 - newsletter from the local credit union it would have been a major inconvenience if these 37 messages had been marked as spam (well, the domain-related e-mails and receipts, at least). i should note, however, that each of these 37 e-mails matched the MISSING_MID rule with a score of 0.14. on a related note, i sometimes receive messages missing the Date: header (which *is* required by rfc5322). several months ago, i began testing scoring/blocking them only to discover that level3.com (one of my upstream transit providers) was sending out some rather important notification e-mails without a Date: header. even though they were in violation of rfc, i still couldn't do anything about it because i needed to receive those notices. also, out of those 1144 e-mails currently in my inbox, seven (all receipts from atlantic.net's billing system) contain two Date: headers. based on just what i've found in the last 10 minutes, i would be very careful about scoring anything related to {invalid|missing|extra} headers too high. definitely test your rules extensively (with very low scores) before rolling them out to production! /chl
Re: The word on messages w/ no Message-Id
On Mon, 28 Sep 2015, Philip Prindeville wrote: I’m getting a lot of messages from head-hunters, my wife’s auto dealership, etc. that look like they’re being generated by legitimate [sic] email campaigns, but they don’t have a message-id. Since the message-id needs to be universally unique, the general guidelines are that it be generated by the originator using a locally-unique value concatenated with the originator’s identity (which as a domain name, should be globally unique) thus guaranteeing universal uniqueness. RFC-5322 says the “Message-ID” SHOULD be present, and per Section 3.6.4: [snip..] Extracting the operative text: "The "Message-ID:" field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it […]. The message identifier (msg-id) itself MUST be a globally unique identifier for a message.” Obviously a missing Message-ID is hardly unique, and hence this requirement is not being fulfilled. Does this warrant scoring the message severely? I say “yes”. Anyone else? -Philip Been there, tried that, got burned by the FPs, tried rules that hit non-RFC compliance for Message-IDs, burned by even more FPs. (people tend to get pissed when their airline reservation messages get spam-score Junked). So have some low-scoring rules for missing/bad M-IDs, but not heavily scored. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: The word on messages w/ no Message-Id
On Mon, 28 Sep 2015 12:22:20 -0600 Philip Prindeville wrote: > I’m getting a lot of messages from head-hunters, my wife’s auto > dealership, etc. that look like they’re being generated by legitimate > [sic] email campaigns, but they don’t have a message-id. Yes, we see that quite a bit. > RFC-5322 says the “Message-ID” SHOULD be present, and per Section > 3.6.4: [... big chunk snipped ...] It wasn't really necessary to quote that much of the RFC. > Does this warrant scoring the message severely? > I say “yes”. It's up to you. Are you trying to stop spam? Or punish those who ignore RFCs? Because the two goals are not necessarily the same. Here's some data. In our logs today, we have seen about 146,000 messages that lacked a Message-Id: header. Of those, about 74000 were caught as spam (due to other rules) and about 72000 were accepted. That doesn't mean that the 72000 accepted messages were definitely not spam, but I think we would have heard from our customers had a significant number been spam. So: No, I do not think lack of a Message-Id: header warrants scoring the message "severely". Maybe a point or so. Regards, Dianne.
Re: The word on messages w/ no Message-Id
On 9/28/2015 2:22 PM, Philip Prindeville wrote: Though listed as optional in the table in section 3.6, every message SHOULD have a "Message-ID:" field. Furthermore, reply messages SHOULD have "In-Reply-To:" and "References:" fields as appropriate and as described below. This is much more plain-english and clearly says SHOULD, so my interpretation of the rest would be what MUST be done IF "Message-ID" is present. In any event, RFC compliance is orthogonal to being spam or ham and at the end of the day, SA is an "I don't want this email" spam classifier and not an RFC validator. If you don't want to be getting those emails, they are spam and you should score it something reasonable that doesn't prevent you getting other desired messages. While I don't have any specific examples of ham without Message-ID, it's not a stretch to imagine they exist. I personally wouldn't write that rule.
The word on messages w/ no Message-Id
I’m getting a lot of messages from head-hunters, my wife’s auto dealership, etc. that look like they’re being generated by legitimate [sic] email campaigns, but they don’t have a message-id. Since the message-id needs to be universally unique, the general guidelines are that it be generated by the originator using a locally-unique value concatenated with the originator’s identity (which as a domain name, should be globally unique) thus guaranteeing universal uniqueness. RFC-5322 says the “Message-ID” SHOULD be present, and per Section 3.6.4: 3.6.4. Identification Fields Though listed as optional in the table in section 3.6, every message SHOULD have a "Message-ID:" field. Furthermore, reply messages SHOULD have "In-Reply-To:" and "References:" fields as appropriate and as described below. The "Message-ID:" field contains a single unique message identifier. The "References:" and "In-Reply-To:" fields each contain one or more unique message identifiers, optionally separated by CFWS. The message identifier (msg-id) syntax is a limited version of the addr-spec construct enclosed in the angle bracket characters, "<" and ">". Unlike addr-spec, this syntax only permits the dot-atom-text form on the left-hand side of the "@" and does not have internal CFWS anywhere in the message identifier. Note: As with addr-spec, a liberal syntax is given for the right- hand side of the "@" in a msg-id. However, later in this section, the use of a domain for the right-hand side of the "@" is RECOMMENDED. Again, the syntax of domain constructs is specified by and used in other protocols (e.g., [RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore incumbent upon implementations to conform to the syntax of addresses for the context in which they are used. message-id = "Message-ID:" msg-id CRLF in-reply-to = "In-Reply-To:" 1*msg-id CRLF references = "References:" 1*msg-id CRLF msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS] id-left = dot-atom-text / obs-id-left id-right= dot-atom-text / no-fold-literal / obs-id-right no-fold-literal = "[" *dtext "]" The "Message-ID:" field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it (see below). This message identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one version of a particular message; subsequent revisions to the message each receive new message identifiers. Note: There are many instances when messages are "changed", but those changes do not constitute a new instantiation of that message, and therefore the message would not get a new message identifier. For example, when messages are introduced into the transport system, they are often prepended with additional header fields such as trace fields (described in section 3.6.7) and resent fields (described in section 3.6.6). The addition of such header fields does not change the identity of the message and therefore the original "Message-ID:" field is retained. In all cases, it is the meaning that the sender of the message wishes to convey (i.e., whether this is the same message or a different message) that determines whether or not the "Message-ID:" field changes, not any particular syntactic difference that appears (or does not appear) in the message. The "In-Reply-To:" and "References:" fields are used when creating a reply to a message. They hold the message identifier of the original message and the message identifiers of other messages (for example, in the case of a reply to a message that was itself a reply). The "In-Reply-To:" field may be used to identify the message (or messages) to which the new message is a reply, while the "References:" field may be used to identify a "thread" of conversation. When creating a reply to a message, the "In-Reply-To:" and "References:" fields of the resultant message are constructed as follows: The "In-Reply-To:" field will contain the contents of the "Message-ID:" field of the message to which this one is a reply (the "parent message"). If there is more than one parent message, then the "In-Reply-To:" field will contain the contents of all of the parents' "Message-ID:" fields. If there is no "Message-ID:" field in any of the parent messages, then the new message will have no "In- Reply-To:" field. The "References:" field will contain the contents of the parent's "References:" field (if any) followed by the contents of the parent's "Message-ID:" field (if any). If the parent message does not contain a "Referenc