On 2015-04-25 11:28:15 +0200, Nicolas George wrote: > Le sextidi 6 floréal, an CCXXIII, Vincent Lefevre a écrit : > > This is not that simple. I want my script to be very reliable. > > Well, your script is in Perl, so implicitly you consider that CPU cost is > negligible. If you manage to optimize everything else (or make the > processing more complex) so that it becomes CPU-bound, then you will have to > consider reimplementing in C.
The CPU time is OK. If I really want an improvement (small delay in real time), I should probably do multithreading. > Until then, I believe you are right to trust Perl's IO buffering. > > > In particular, if there is a message without a Message-ID and > > with "\nMessage-ID" in the body, I want to detect it. This kind > > of thing really happens in practice (though this is rare), e.g. > > due to some buggy mail software that breaks the headers and put > > a part of them in the body. I also want to check the format of > > the headers and possible duplicate Message-ID. What my script > > really does is: > > IMHO, if you really want to validate the format of the headers, I advise to > read the whole header into a string and work from it. Something like: > > my $header = ""; > while (<$file>) { > last if $_ eq "\n"; # or /^\r?\n\z/ if you do not trust line ends > $header .= $_; > } > my @header = split /\n(?!\s)/, $header; I don't understand the point. Accumulating in strings (which involves copies and possible reallocations) and doing a split is much slower than reading lines one by one and treating them separately. > > while (<FILE>) > > Out of curiosity, do you have a particular reason not to use a real > variable for your file handles? This is a small loop. The code like that is compact and more readable for me. Personal taste. > > /^Message-ID:\s+(<\S+>)( \(added by .*\))?$/i or next; > > I have never seen this "added by" in my mails, but assuming it is > necessary for you, Yes, obviously. This came from some MTA's when the MUA didn't generate a Message-ID. This lasted at least until 2005. > note that it may be written like that: > "Message-ID: <foo@bar> (added\n\tby someone)\n" I don't think so: AFAIK, these MTA's never wrapped this header. Anyway my regexp is sufficient in my mailbox. If there is a need (e.g. because new mail software does something else with the Message-ID), I can modify my script. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150427080505.gb3...@ypig.lip.ens-lyon.fr