On January 1, 2004 at 20:28, Jeff Breidenbach wrote: > Finally, Chuq had a good point about requirements changing over > time. In the future, MHonArc may want to move towards encouraging more > semantic markup (read: in default output) similar to the <DIV> tags > found throughout mharc output. This allows easier hooks into > on-the-fly obfuscators and whatnot, with the added bonus of better CSS > interoperability. I think CSS is finally coming of age (check out > csszengarden.com), and I am looking forward to moving more and more > layout information from the mhonarc resource file to style sheets.
The problem with this approach is that it won't work with text-based browsers. Accessibility is something I try to maintain, therefore I am reluctant to use measures that mandate particular types of browsers. I first thought of using libgd to have address changed into CGI links that generate an image on the fly with showing email address. I.e. Harvesters would have to use OCR to get the address. However, this will not work for text-based browsers, but I thought it would be kind of nifty. Another alternative is to remove linking of addresses, and then using a obfsucation technique like: earl<!-- -->@<!-- -->example.com This way the address renders like "[EMAIL PROTECTED]" (and can be copy-n-pasted by readers to their MUA), but a harverster may not catch it. Of course, a smart harvester that expands entity references and deletes comment declarations would. I read a study dated March 2003 that showed that simple obfsucation techniques actually work, but I think (and the study even states) that it likely that it is a matter of time that spammers adapt. Right now, there are so many un-obfsucated addresses, spammers are not driven yet to deal with obfsucation techniques. However, once obfsucation is the norm, spammers will adapt. Mail-archive.com uses a POST form to obfsucate addresses, but it is straight-forward to customize a harvester to defeat it. Therefore, for long term protection, obfsucation does not seem to be the best method. The image idea is nice since it is type of a Turing test, and the image can be generated to give OCR systems trouble. But, people using text-only browsers will not be able to determine author addresses of messages. I think it is valuable for users (mainly ones not subscribe to any mhonarc.org list) to be able to read the archives and have the ability to contact individual authors directly. Even I have done such a thing when scanning archives of other lists. Since text-only browsers can still read the messages in the archives, is it okay that they will not have the ability to determine the author's address if an image-based solution is adobted? Is this an acceptable limitation weighed against the problem of spam? --ewh