On 09/07/2017 12:01 PM, sebb wrote: > On 7 September 2017 at 06:06, Daniel Gruno <[email protected]> wrote: >> On 09/07/2017 12:24 AM, sebb wrote: >>> On 6 September 2017 at 07:32, Daniel Gruno <[email protected]> wrote: >>>> On 09/06/2017 12:09 AM, sebb wrote: >>>>> On 2 September 2017 at 09:02, <[email protected]> wrote: >>>>>> Repository: incubator-ponymail >>>>>> Updated Branches: >>>>>> refs/heads/master c8f4d3b7d -> df0b7ee1c >>>>>> >>>>>> >>>>>> crop out trailing whitespace for redundant archiver >>>>>> >>>>>> This deals with spurious whitespace that can exist on >>>>>> clustered setups due to corrections inside the MTAs. >>>>>> This only deals with trailing whitespace, everything else >>>>>> is preserved. >>>>> >>>>> -1 >>>>> >>>>> I don't think this is a good idea. >>>> >>>> Your -1 is noted, but I don't consider the reasoning valid for a veto, >>>> so I'll interpret this as just a plain -1. >>> >>> AIUI, that's not your call. >> >> It's not my call to determine whether technical merit is sound (that >> would be for the PPMC in such cases), but there has to be technical >> merit in -1 in the first place. Saying "this is not a good idea" does >> not convey technical reason. You've since elaborated on that in your >> reply, and _that_ I believe constitutes a technical reason. > > Ah ok. > > I guess I was too terse, I should have linked to the previous mails I > sent about the same issue. > >>> >>>> I think it's a good idea, I think it solves some real problems that have >>>> been spotted in clustered setup. It could also solve problems where one >>>> archives as mbox with an extra newline by mistake. It's also an optional >>>> generator, not the default. Could you elaborate on why trailing >>>> whitespace would matter? >>> >>> I already wrote that ignoring whitespace causes a problem because it >>> means two different inputs end up with the same database id. >>> There's no way of knowing which one was correct; the wrong one may end >>> up being stored. >> >> But they would both have the same sender, date, list, message, >> attachments etc filed under the same ID - is that not what we want? What >> we _don't_ want is for trailing whitespace to cause duplicates. Put in >> other words: Why would we at all care whether one has the added newline >> or two and the other one doesn't? We're dealing with showing people >> emails, but bit-perfect of what was sent (including duplicates as a >> result of bit-diversion), but rather of what was intended. > > I disagree; I think it's important to show the input email as exactly > as possible. > Whitespace trimming could damage some emails. > >> If we wanted >> a perfect copy, we'd use the full digest and skip clustered setups all >> together, hoping machines don't die on us. > > Not so, it must be possible to have perfect copies in clustered setups. > Otherwise clustered backup systems would be impossible. > It's just that the current design may make this tricky. > >> This is for those rare >> occasions where something _does_ go wrong, and as seen, sometimes >> postfix will add some extra newlines - I still don't know why it does >> that in every case, I only know that it does, and likely other MTAs do >> as well. > > That's largely my point. > The cause needs to be determined otherwise the generator is being used > to ignore what may be a bug. > > Besides, in the cases I have seen (and noted on this list), it is not > only a difference in trailing whitespace. > The archived-at header is missing in one of the copies. > As I have written already, that points to non-identical treatment by > the different cluster members. >
The archived-at, and possibly the extra whitespace, likely stems from a postfix oddity (that I really can't fix :p), in that mail delivered locally will be handled internally, even if it's supposed to be rerouted to a different address than the original. The case is as follows, I think: - 3 nodes act as MTAs - Each node will receive an email (whichever node has highest priority and is awake) and duplicate it into 3 copies, that each go to all the boxes in the MX setup. So, one of these copies go to the box itself, and the two other copies go to the other boxes if/when they are online (this is to not bounce an email if a box should be down or erroring out). Now, since one of the copies go to the box itself (even though it goes to a new email address), it is somehow rerouted differently, and that causes the header (and possibly the whitespace fixes) to either be there or not. I'll gladly admit that we're working on figuring this out differently, and possibly publishing this as a "what to do and not do" guideline later on, when the issue is fixed. I can also accept that if we have this guideline, we can omit the whitespace trimming.
