Radoslaw Zielinski writes:
> > http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603
> > ------- Additional Comments From [EMAIL PROTECTED] 2006-07-26 18:17 -------
>
> I dislike the idea of using Bugzilla as a replacement for a mailing list
> (bleh, why doesn't ASF use RT); let's move here, if you don't mind...
OK, as long as you find the thread on nabble and post a pointer to the
bug; it's a *lot* easier to track down a BZ discussion 6 months down the
line, than find a mailing list thread.
by the way -- bugzilla tip -- the MozEX extension is a god-send for
dealing with bugzilla's small text entry boxes, allowing you to use
a decent external editor instead.
> [...]
> > Using IPC::Open3 is a nightmare for portability, btw -- I'm pretty sure it
> > doesn't work on win32 at least -- but maybe there are other issues there
> > anyway?
>
> I avoided using shell... well, this can be easily changed.
Yep, perl's own 'open "...|"' shell escapes are actually more portable.
sa-update's code is worth looking at, for an example.
> > how does it compare to current spamd, in speed terms?
>
> 174%, crushes the hacky 0.0002s optimizations like cockroaches.
ha! I suspect these numbers are without any ruleset, though ;) Also, worth
noting that spamd does some time-consuming tasks that apache-spamd doesn't
(like log via syslog).
> $ tail -n1 *.log
> ==> prefork.log <==
> parsed 2000 messages in 00:04:32 (272.930377 s),
> 7.3279 msgs/s (440 msgs/min, 26380 msgs/h)
>
> ==> spamd.log <==
> parsed 2000 messages in 00:08:00 (480.140767 s),
> 4.1654 msgs/s (250 msgs/min, 14996 msgs/h)
>
> ==> worker.log <==
> parsed 2000 messages in 00:04:35 (275.170448 s),
> 7.2682 msgs/s (436 msgs/min, 26166 msgs/h)
>
> Apache-spamd / spamd run with -x -m 5, Bench-spamd.pl with -c 3 -m 2000.
> Hardware: Athlon 1.7xp, 700MB RAM.
so prefork.log and worker.log are both using apache-spamd, with
those MPMs? That's a pretty excellent speedup.
> > Regarding logging. What's the issue? (I couldn't actually spot any
> > logging in
> > that tarball.)
>
> Apache redirects stderr to error_log, I don't know how to capture it
> (OTOH, I haven't been looking for it, but I don't think it's a good
> idea). The ErrorLog directive doesn't support redirecting to syslog.
>
> So, all the debug messages from SA and some startup errors detected
> at the config phase are logged. This isn't:
>
> [5273] info: spamd: connection from localhost [127.0.0.1] at port 2347
> [5273] info: spamd: checking message <[EMAIL PROTECTED]> for (unknown):500
> [5273] info: spamd: clean message (0.0/5.0) for (unknown):500 in 0.2
> seconds, 5978 bytes.
> [5273] info: spamd: result: . 0 -
> scantime=0.2,size=5978,user=(unknown),uid=500,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=2347,mid=<[EMAIL
> PROTECTED]>,autolearn=disabled
>
> I have not attained enlightement about the correct way to do it yet.
>
> That would require opening a file to write at some state, passing the
> filehandle somehow (global var probably), locking... If a syslog socket
> has been requested, I guess separate connections are needed... Complex
> and error prone.
for what it's worth, I'd say:
- forget about syslog; apache has its own logging model which doesn't
involve that, so we don't have to either ;)
- open ">>" filehandles have atomic writes for inter-process contention,
if you use syswrite(), and the target of the fh is a file on a local
filesystem [*]. So that's a good way to log data atomically:
open LOG, ">>logfile";
[...]
my $message = "info: foo bar baz\n";
syswrite LOG, $message;
global var would be ok, although I can see people wanting to have
different logfiles for different vhosts...
([*]: well, atomic enough, at least. see
lib/Mail/SpamAssassin/BayesStore/DBM.pm for workarounds in the case
when partial writes occur due to out-of-disk-space conditions --
that's when it gets messy!)
> Adding complexity is easy, keeping it simple and obvious makes a worthy
> challenge.
yep -- feel free to ask, of course!
> > Should it be integrated into the main distro, or kept as a separate module
> > with its own Makefile.PL, do you think? (I think I'd prefer to integrate,
> > if
> > possible.)
>
> If it's not integrated... will be lost, in time.
ok, agreed.
> > And finally, I think it could do with more documentation and tests ;) a
> > lot of
> > that would probably make more sense after the integration-into-distro
> > question
> > is resolved (e.g. "what README does it go into").
>
> I'd go for separate README.apache to keep things transparent.
Sure; like the spamc model. But there has to be other integration into
documentation, the top-level README, INSTALL, etc. at least.
> Right now, this is written as a PerlProcessConnectionHandler (mod_perl
> handler for custom protocols). I just figured out it *can* be done
> using the more popular HTTP handlers (PerlResponseHandler and friends)
> and I'm experimenting with it right now.
>
> That would have two benefits I see right now (I doubt it'd change
> anything regarding performance).
>
> First one is possibility to use mod_log_config (the CustomLog directive).
> If wee agree to compress that four log lines per connection to one, it
> would be a clean and efficient way to get the access logging done.
Sure. Note however that the "result:" line --
[5273] info: spamd: result: . 0 -
scantime=0.2,size=5978,user=(unknown),uid=500,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=2347,mid=<[EMAIL
PROTECTED]>,autolearn=disabled
has a pretty well-defined format, and log parsers that know how to read
it. it'd be good to keep that.
> Second one... Well, here it is; try to keep an open mind. ;-)
> I'm reading http://catb.org/esr/writings/taoup/ right now; around the
> chapter about protocol design it bugged me: why isn't the spamd protocol
> based on HTTP?
>
> Gain: forget the fancy libspamc, forget Mail::SpamAssassin::Client, get
> over with parts of spamd network-related code ("sysread not ready"
> anyone?), reduce trash code in various spamc implementations (exim,
> whatever)... Just use a HTTP library to do a simple POST (and make sure
> the library allows you to read the Spam header after a 2xx response).
>
> So. If I used the mod_perl HTTP handlers, that would get us very close
> to rolling out the SPAMD/2.0 protocol [1]. After some code refactoring,
> it'd be possible to use spamd as FastCGI (or regular CGI, if someone
> wishes) with any HTTP server. Authentication? Just get a mod_auth*
> module. Compression? mod_deflate. Whatever? mod_whatever.
>
> POST /?method=PROCESS HTTP/1.1
>
> The more I think about it, the more I like the idea.
Wow. That's scary. ;) I'll have to think about that one.
I'm not sure I see *sufficient* benefit, in terms of the other parts of
the code, though. The two protocols are both very, very simple; I think
there'd be more code needing to be written to support HTTP (with a new
URL-based, CGI-style parameter-passing scheme), than the existing lines of
code for supporting SPAMD!
--j.