Re: [Mailman-Users] Gmail "features"

2012-08-09 Thread Stephen J. Turnbull
Brad Knowles writes:

 > I really don't think that this is a disk storage issue, I think
 > this is much more likely to be a wrong-headed idea that this kind
 > of thing will be beneficial to the users -- after all, they know
 > that they sent the message and that copy is sitting in the outbox,
 > so they don't need to have another copy sitting in the inbox.

I agree it's not about disk storage, I think it's just de-duplication
of the messages that users see.  Back when Canter and Siegel first got
started, we hated spam not because there was so much of it, but
because it was so bloody annoying to see it in every newsgroup we
subscribed to.  I don't see why ordinary users wouldn't feel the same
way (of course one dupe is far less annoying than Green Card lawyers
in every group you read, but if you get a lot of them, the annoyance
level would build up).

I disagree that it's wrongheaded, if Gmail is going to always do
de-duplication with one algorithm.  Gmail always stores the mail you
sent, as you sent it.  It is not necessarily the case that it will
come back to you in one piece.  After all, our favorite list
distribution software is just bristling with settings determining
what's going to be left of your post once it arrives at the
subscriber's mailbox.  Everybody can understand if they send out a
PNG, it comes back from the list stripped or the mail gets dropped,
and for some reason they don't have a copy of their original.  OTOH,
only a very few would know, let alone care, about missing RFC 2369
headers in a few copies they have locally!

I just think that users ought to have a choice of how de-duping is
done.  Or if it gets done at all.

 > If you think it's worthwhile, you could always try turning on
 > personalization for the list, and then add a footer with unique
 > information per recipient.  That would cause the message-id to be
 > unique as well as the message body, and wouldn't require any new
 > code to be developed.

Small correction: the Message-Id will be the same for all copies.
Mailman cannot go changing those, or it would play hell with all
threading MUAs.

Steve
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Gmail "features"

2012-08-09 Thread Brad Knowles
On Aug 8, 2012, at 11:11 PM, Stephen J. Turnbull  wrote:

> Well, unfortunately Gmail is closed-source and I don't know what the
> full algorithm is.  Surely Message-Id is part of it, but evidently
> there are other aspects to it, or the behavior you and Brad
> R. describe wouldn't happen.

In the large-scale mail system design that I've done in the past, the tuple of 
(sender,recipient,message-id) was considered to be a pretty good index key for 
the mail database, albeit not a guaranteed unique key.  Most greylisting 
implementations use a tuple of (sender,recipient,sending-IP) to determine if 
this particular message should be delayed or not.

I even did a single-instance-store message database design that did an SHA-1 
hash of the message body content to see if the message contents really were 
unique, and if not then you could store the headers separate from the body and 
for the body you could just include a pointer to the existing message body that 
you already have.  I believe that some implementations of Microsoft Exchange 
implement a similar algorithm.

If you wanted to go to the extreme, you could de-compose each message to the 
individual MIME bodyparts, and then do an SHA-1 hash on each of those.  So, no 
matter how many copies of the latest Dilbert cartoon get mailed out, and no 
matter what text or other material might surround that, you'd still be able to 
reduce that to storing just one copy of the cartoon with multiple inbound links.

On the other hand, Nick Christensen (author of "Sendmail Performance Tuning", 
ISBN-13: 978-0321115706) and I discovered that you would be trading more disk 
I/O operations in order to try to save a relatively trivial amount of disk 
space, and that's the exact opposite of the trade-off you want to make given 
the way disk storage capacities have rapidly grown while I/O capacities have 
been relatively stagnant.  We discussed all these issues in the invited talk 
"Design and Implementation of Highly Scalable E-mail Systems", see 
.

I happen to know the former SRE for gmail, but I don't think he'd be able to 
tell me anything useful on this subject.


I really don't think that this is a disk storage issue, I think this is much 
more likely to be a wrong-headed idea that this kind of thing will be 
beneficial to the users -- after all, they know that they sent the message and 
that copy is sitting in the outbox, so they don't need to have another copy 
sitting in the inbox.

And maybe for the majority of users, that decision might actually be helpful.  
But they need to give people a way to turn that option off, so that they don't 
break the ability to do debugging when testing the sending of messages to 
remote systems.

Of course, if people are on Google Groups, then this probably isn't an issue 
for them.  And maybe that's the other part of the problem -- maybe Google sees 
this "feature" as being a competitive advantage for them with combining Google 
Groups and gmail working better together, and they don't see the benefit of 
making gmail be able to play better with the rest of the world.


If you think it's worthwhile, you could always try turning on personalization 
for the list, and then add a footer with unique information per recipient.  
That would cause the message-id to be unique as well as the message body, and 
wouldn't require any new code to be developed.

--
Brad Knowles 
LinkedIn Profile: 

--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Gmail "features"

2012-08-09 Thread Stephen J. Turnbull
Lucio Crusca writes:

 > Again, that's not the point and we basically agree gmail is bad,
 > but... a standard is some set of commonly accepted rules. Be it
 > written down into a RFC or not.

It doesn't need to be in an RFC, but it must be written.  "What is
commonly accepted" is simply not a standard because it's impossible to
know if you're conforming, or what you need to conform to.
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Gmail "features"

2012-08-09 Thread William Bagwell
On Thursday 09 August 2012, Lucio Crusca wrote:
> I'd only like to slap gmail in the face if I could, by
> working around their wonderful feature, just for the taste of feeling
> smarter than they pretend to be. All in all, what is hacking about if
> not that?

Please do! Gmail user only because my ISP outsourced mail to them three 
years ago. Was helping a small discusion list move from an LSoft 
ListServe to Mailman at the time so suddenly missing my own posts back 
made testing impossible.

Was infuriated when I discovered there was no way to turn this stupid 
feature off. My work around is to post through my web hosts mail server. 
Most people do not have this option so a setting in Mailman for other 
would be great.
-- 
William
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Gmail "features"

2012-08-09 Thread Lucio Crusca
Stephen J. Turnbull writes:
> I don't think so.  Perhaps "MUA" is the wrong term for a message store
> "in the cloud", but the fact is that Gmail is the final recipient as
> far as the RFCs are concerned.  Eg, IMAP servers often implement SIEVE
> recipes and spam filtering, so some messages will be lost.

Again, that's not the point and we basically agree gmail is bad, but... a 
standard is some set of commonly accepted rules. Be it written down into a RFC 
or not. The "standard" (expected by most people) behavior of a email final 
recipient software, if not MUA, is to receive emails, not to throw them away 
based on ill advised algorithms. Sieve recipes and spam filtering is something 
that users can disable and modify at will (at least that's the "standard" for 
MUAs). If a recipe or spam filter accidentally trashes a message, the user can 
always disable that recipe or filter. Gmail does break the standard (expected 
behavior) in that does not let users choose if they want to receive some 
messages that are not spam by any stretch of imagination. Imho.

> In any case, no messages are lost; only copies with different
> meta-data.

However some information is actually lost (threading in the user's inbox and 
the acknowledgment that your message has actually reached the mailing list).

> I don't really disagree with you that Gmail's behavior is horrible.
> My point is that if you think its behavior is non-conforming, you may
> be in for other, even less pleasant surprises in the future.

You see, there must be a reason why I decided to roll my own mail server after 
all... I'm prepared to surprises. I'm not a gmail user, though I do have a 
sleeping gmail account. I'd only like to slap gmail in the face if I could, by 
working around their wonderful feature, just for the taste of feeling smarter 
than they pretend to be. All in all, what is hacking about if not that?

> I can't say I have a lot of sympathy.  You get Gmail for free, you
> shouldn't think it comes with no strings attached.

Quite obvious, though I can't see what Gmail earns from that "feature", but I 
suspect it's me not foreseeing very far away.
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org