SOAP::Lite and libapreq...
I'm having a problem using a SOAP::Lite mod_perl handler, and I can't seem to see what I'm missing. Basically I've setup a section as such: SetHandler perl-script PerlHandler SOAP::Handler And the module SOAP::Handler as such: use strict; use SOAP::Transport::HTTP; my $server = SOAP::Transport::HTTP::Apache -> dispatch_to('SOAP::Services'); sub handler { $server->handler(@_); } And then I've got all my calls in SOAP::Services. Now, it's actually working properly, but every request I get, I see this in the error log. [Tue Mar 18 17:24:10 2003] [error] [client 127.0.0.1] [libapreq] unknown content-type: `text/xml; charset=utf-8' Having a look in libapreq I find: if (r->method_number == M_POST) { const char *ct = ap_table_get(r->headers_in, "Content-type"); if (ct && strncaseEQ(ct, DEFAULT_ENCTYPE, DEFAULT_ENCTYPE_LENGTH)) { result = ApacheRequest_parse_urlencoded(req); } else if (ct && strncaseEQ(ct, MULTIPART_ENCTYPE, MULTIPART_ENCTYPE_LENGTH)) { result = ApacheRequest_parse_multipart(req); } else { ap_log_rerror(REQ_ERROR, "[libapreq] unknown content-type: `%s'", ct); result = HTTP_INTERNAL_SERVER_ERROR; } } else { and c/apache_request.h:#define DEFAULT_ENCTYPE "application/x-www-form-urlencoded" c/apache_request.h:#define DEFAULT_ENCTYPE_LENGTH 33 c/apache_request.h:#define MULTIPART_ENCTYPE "multipart/form-data" c/apache_request.h:#define MULTIPART_ENCTYPE_LENGTH 19 Creating a SOAP::Lite client and setting +trace => 'all', I see: Accept: text/xml Accept: multipart/* Content-Length: 530 Content-Type: text/xml; charset=utf-8 SOAPAction: "http://localhost/MEServices#CreateSession"; http://www.w3.org/1999/XMLSchema-instance"; xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"; xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ xmlns:xsd="http://www.w3.org/1999/XMLSchema"; SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/";> So SOAP::Lite is setting the Content-Type to "text/xml; charset=utf-8", but libapreq only accepts "application/x-www-form-urlencoded" or "multipart/form-data" or POST requests. Strangely though, the request does actually work, and the SOAP method does get called correctly. I haven't looked further into the code to see why this is. Anyway, I can't actually see how a mod_perl SOAP handler can work without getting this error message very time in the log, and I can't believe that no-one else has come across this before, and that I must be missing something very obvious... Can anyone help? Rob
Apache::DB and perl 5.80
I've noticed a few comments around the web of problems with 5.8.0 and Apache::DB, but no responses that anyone is looking at it or has a solution. ~www/bin/httpd -X -Dperldb [notice] Apache::DB initialized in child 2076 [Thu Nov 28 03:24:44 2002] [error] No DB::DB routine defined at /usr/local/lib/perl5/5.8.0/i686-linux/lib.pm line 10. Compilation failed in require at conf/startup.pl line 21. BEGIN failed--compilation aborted at conf/startup.pl line 21. Compilation failed in require at (eval 6) line 1. Does anyone know is anyone is looking into this or if there's a solution floating around? Rob
CGI parameters appear to be doubled on 8 bit chars...
Just wondering if anyone has seen this problem before, or has a general solution to it. Basically what we see, is that with some submitted forms, usually with 8 bit data, the POST parameters passed become 'doubled'. The problem is that we have a loop like this to gather out all the parameters early on in our handler code. foreach my $Key ($R->param) { my ($UKey, @UParam) = ($Key, $R->param($Key)); $CGIState{$UKey} = scalar(@UParam) > 1 ? \@UParam : $UParam[0]; } The result is that we end up with an array reference for every parameter, instead of a scalar value. And we can't just always take the first value, because multi-select list boxes also return array values, and we don't know at this stage in the code what type of form element each param comes from. In general, the values are the same, except where there is 8 bit data, in which case the 2 versions are different. Here's an example of what we see: $VAR1 = { 'LastScreen' => [ '/MR-@0,324,', '/MR-@0,324,' ], 'Subject' => [ 'Blah blah blah', 'Blah blah blah' ], 'Message' => [ '‘blah blah’ blah blah ‘blah blah’ blah.', '<91>blah blah<92> blah blah <91>blah blah<92>blah.' ] } (The <91> etc are highlighted when using 'less', so I presume that probably means it's hex code 0x91) So it seems somewhere the 8 bit data is coming as both HTML entity versions, and the raw 8 bit data version. I'm not sure if this is IE or mod_perl doing this, though I'm guessing it's IE. So in general, my questions are: 1. Have people seen this before, and how do you generally deal with it? 2. Actually how do you handle in general 8 bit data? How do you know which charset it's coming as? 3. Is there any documentation anywhere on why this is happening? Who is sending the two versions? How to detect it? Any help or pointers on dealing with these issues would be appreciated. Thanks Rob
Re: POST problems
I've seen similar weird things happening with some of our users. It's always IE 5 or IE 5.5 users it seems. It also seems to start 'randomly'. We'll get emails saying "Everything was working great last week, now whenever I click a button on your site nothing happens". Are far as I can tell, the form is submitted, but there are no fields in the submitted data. I haven't been able to reproduce it reliably yet, or work out if only some fields are sent, or none at all. Has anyone else heard of something like this? Rob - Original Message - From: "Corey Durward" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, September 20, 2002 8:52 AM Subject: POST problems > Hi all. > > Having a curious problem with the POST command with all CGI scripts on a > particular server (Apache/1.3.14 (Unix) (Red-Hat/Linux) DAV/1.0.2 PHP/4.0.6 > mod_perl/1.24). Basically, it doesn't work. Form inputs are treated as > though the inputs were blank. It doesn't appear to be a permissions error as > no error reports are issued and nothing appears in the error log. GET still > works. > > Everything was working fine up until a few weeks ago. The server admin > claims nothing has been modified in that time (I'm skeptical) and is > clueless as to the cause. I've scoured httpd.conf and haven't found anything > obvious there that might cause this. > > Fishing for a clue. Any suggestions appreciated. > > Corey. > > > > > >
Re: Persistent Net::Telnet Objects
Our project needed persistent socket connections open as well. There is supposed to be a standard mechanism to pass file descriptors between unix processes, though it's bugginess level depends on your OS. There is a perl module for this called Socket::PassAccessRights. So what you can do is create a daemon process that just hangs round holding socket connections open, like a socket cache basically, and passing them back and forth between Apache processes based on some session ID or user ID or the like. Your daemon ends up looking something like this (with lots more error checking of course) my %sockmap; while (1) { my $clientsock = $listen->accept(); chomp(my $sessionid = <$clientsock>); my $cachesock = ($sockmap{$sessionid} ||= opennewsock()); Socket::PassAccessRights::sendfd(fileno($clientsock), fileno($cachesock)); $clientsock->close(); } And in your mod_perl code you do something like: my $serversock = IO::Socket::INET->new(Server => 'localhost', Port => SOCKETPOOLPORT); print $serversock $sessionid, "\n"; my $Fd = Socket::PassAccessRights::recvfd(fileno($serversock)); open(my $realsocket, "<&=$Fd"); fcntl($realsocket, F_SETFD, 0); my $ofh = select($realsocket); $| = 1; select ($ofh); If you do some experimenting, you'll get something that works, you'll also find lots of cases that don't. Rob - Original Message - From: "French, Shawn" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, May 30, 2002 3:53 AM Subject: Persistent Net::Telnet Objects > Vitals: > Apache/1.3.20 (Win32) mod_perl/1.25_01-dev mod_ssl/2.8.4 OpenSSL/0.9.6a on > Windows 2000 with PHP 4.21 > > I am working on a project that requires me to have two telnet objects per > user session opened, and accessible throughout the user's session. I have > looked at Apache::Session and many other solutions but my problem is that to > keep a Net::Telnet object, I need to keep open sockets and filehandles, so I > cannot serialize the object and store it in a database or file. > > Currently I have similar code working flawlessly: > ### > # "startup.pl" - called when apache starts (ie. PerlRequire > "d:/Apache/conf/startup.pl") > ## > use MySite::Session; > > ### > # "Session.pm" > ## > @EXPORT = qw( %sessionHash ); > our %sessionHash; > > ### > # "init_session.pl" - called IN MOD_PERL when a new session is requested > ## > use MySite::Session; > $sessionHash{$session_id . "_telnetObj"} = Net::Telnet->new(); > > ### > # "dostuff.pl" - called IN MOD_PERL many time throughout the session > ## > use MySite::Session; > my telnetObj = $sessionHash{$session_id . "_telnetObj"}; > bless (\$telnetObj, "Net::Telnet"); > > Although this is working right now, I don't know enough [ anything? :) ] > about Apache or mod_perl to be sure that this will work in the future. What > I am really concerned about is that the telnetObj will only be accessible > from scripts run by the same child process as that which created and saved > it. > > Is there a better way to do this? > > Thanks, > Shawn French > >
Apache::Reload question...
I've got a "reality check" question for people to see that I'm not missing something obvious with our Apache::Reload mod_perl setup. We've recently install Apache::Reload at our site in production and it's working great. In what is probably not the best 'software engineering' style, we've been known to upload several small patches in a single day and used to have to do short restarts to integrate the new code. We now use Apache::Reload instead. Rather than putting 'use Apache::Reload' in each of our modules, I've created a touch file, which after looking through the Apache::Reload code, I noted that you could put a list of modules into it which would be reloaded. On top of this, we use mod_accel as a front end to our mod_perl backend. This combination seems to work great as well for anyone curious. The question I had regards where to put the 'Apache::Reload' directive. The documentation suggests something like: PerlInitHandler Apache::Reload PerlSetVar ReloadAll Off PerlSetVar ReloadTouchFile /tmp/reload_modules The problem I see in a production machine is that each child process will see this on the next request, and attempt to reload it's modules. At that point, you'll loose the shared memory the modules use between child processes. On top of this, the parent process will never get this, so it will never reload modules in the parent. The next time a new child is forked, on the first request it receives it will again attempt to reload the changed modules. Is this correct? Or am I missing something? The alternative I've used is this: PerlRestartHandler Apache::Reload PerlSetVar ReloadAll Off PerlSetVar ReloadTouchFile /tmp/reload_modules Then when I've uploaded any changes, I touch the change file, and do an 'apachectl graceful' to restart the backend. I think this works nicely because: 1) The mod_accel front end will buffer any long file uploads, and any long file downloads. So the actual length of connections from the frontend to the backend is only as long as it takes to process the request and tunnel the data betwen the front->back or back->front. Thus the 'graceful' restart only ever takes a few seconds, and no connections are ever lost, only blocked for a few seconds at the most (the length of the longest request to process). 2) Doing it in the restart handler means that the parent process reloads the modules, and all the newly forked children have shared copies. Can anyone tell me if I'm missing something here? Rob
libapreq problem and mozilla 0.97
Just wondering if anyone has encountered this before and if it's been fixed in libapreq for the upcoming release. Basically, whenever I try and use Mozilla 0.97 with a file upload field on a form and don't select any file in the field, libapreq seems to hang on the $R->parse() call. Mozilla 0.98 seems to work fine, but 0.97 doesn't. While it's easy enough to just say "upgrade", it's still annoying that it hangs a process for a while until our alarm goes off. A couple of things I've noticed, the Mozilla 0.97 file fields might be a bit broken. The raw POST request data is: ... stuff deleted ... -5965166491649760492719885386 Content-Disposition: form-data; name="FMC-UploadFile1"; filename="" Content-Type: application/octet-stream -5965166491649760492719885386 ... more stuff deleted ... While under Mozilla 0.98, which doesn't hang libapreq, the request data is: ... stuff deleted ... -20448977631102520059783368690 Content-Disposition: form-data; name="FMC-UploadFile1"; filename="" Content-Type: application/octet-stream -20448977631102520059783368690 ... more stuff deleted ... Note the extra blank line, which I think the lack of is causing the problem under 0.97. I did an strace under 0.97 and got: read(4, "POST /mail/~354ad16bd30a20352/ H"..., 4096) = 2621 rt_sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}, 8) = 0 time(NULL) = 1012943782 alarm(60) = 60 alarm(0)= 60 rt_sigaction(SIGALRM, NULL, {0x80ee530, [], SA_INTERRUPT|0x400}, 8) = 0 dup2(15, 2) = 2 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 brk(0x9574000) = 0x9574000 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigaction(SIGALRM, {0x81c7c1c, [], SA_RESTART|0x400}, {0x80ee530, [], SA_INTERRUPT|0x400}, 8) = 0 alarm(60) = 0 brk(0x9575000) = 0x9575000 brk(0x9576000) = 0x9576000 alarm(60) = 60 read(4, So, it seems to be hanging because it's trying to read more data when there isn't any. If I do basically the same request under IE I get: read(4, "POST /mail/~354ad16bd30a20352/ H"..., 4096) = 2536 rt_sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}, 8) = 0 time(NULL) = 1012944362 alarm(60) = 60 alarm(0)= 60 rt_sigaction(SIGALRM, NULL, {0x80ee530, [], SA_RESTART|0x400}, 8) = 0 dup2(15, 2) = 2 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigaction(SIGALRM, {0x81c7c1c, [], SA_RESTART|0x400}, {0x80ee530, [], SA_RESTART|0x400}, 8) = 0 alarm(60) = 0 alarm(60) = 60 and it keeps going and works fine. Anyone know what might be happening? How to fix it? Rob
Re: Solved - Odd mod_perl and LimitRequestBody problem
My fault, better be more careful when playing with environment vars. libapreq uses tempnam(3), which uses the environment var TMPDIR when creating temporary file names. Two of us were working on the server, one had it set, the other didn't. That's why one person restarting the server continued to have the problems, while when the other person restarted the server, it suddenly fixed itself. Rob - Original Message ----- From: "Rob Mueller (fastmail)" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, February 04, 2002 4:19 PM Subject: Odd mod_perl and LimitRequestBody problem > We just experienced an odd problem and were wondering if anyone has > encountered this before. We recently set the apache LimitRequestBody > parameter to 1000 (10M) and all was working fine until a recent restart. > We started getting errors in the logs whenever there was a file upload field > in the form, even if no file was selected. > > [Sun Feb 3 21:01:23 2002] [error] [client 127.0.0.1] [libapreq] could not > create/open temp file > > No other changes had been made, and these started occuring immediately after > we did a apachectl stop/start cycle. We restarted the server 3 times, each > time the same problem occured. On the 4th restart, everything started > working fine again, even though no other changes had been made. > > Has anyone ever had a similar experience to this? It's just a bit > disturbing... > > Apache: 1.3.22 > mod_perl: 1.26 > libapreq: 0.33 > > Rob > > >
Odd mod_perl and LimitRequestBody problem
We just experienced an odd problem and were wondering if anyone has encountered this before. We recently set the apache LimitRequestBody parameter to 1000 (10M) and all was working fine until a recent restart. We started getting errors in the logs whenever there was a file upload field in the form, even if no file was selected. [Sun Feb 3 21:01:23 2002] [error] [client 127.0.0.1] [libapreq] could not create/open temp file No other changes had been made, and these started occuring immediately after we did a apachectl stop/start cycle. We restarted the server 3 times, each time the same problem occured. On the 4th restart, everything started working fine again, even though no other changes had been made. Has anyone ever had a similar experience to this? It's just a bit disturbing... Apache: 1.3.22 mod_perl: 1.26 libapreq: 0.33 Rob
Re: slow regex [BENCHMARK]
I recently had a similar problem. A regex that worked fine in sample code was a dog in the web-server code. It only happened with really long strings. I tracked down the problem to this from the 'perlre' manpage. WARNING: Once Perl sees that you need one of "$&", "$`", or "$'" anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression "(?: ... )" instead.) But if you never use "$&", "$`" or "$'", then patterns without capturing parentheses will not be penalized. So avoid "$&", "$'", and "$`" if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, "$&" is not so costly as the other two. Basically one of the modules in the web-app I was 'use'ing needed $', but my test code didn't 'use' that module. The result was pretty dramatic in this case, something that took approx 1 second in the test code was timing out after 2 minutes in the web-server. What I did in the end was something like this: In the code somewhere add this so it's run when a request hits. open(F, '>/tmp/modulelist'); print F join("\n", values %INC), "\n"; close(F); This creates a file which lists all the loaded modules. Then after sticking a request through the browser, do something like: grep \$\' `cat /tmp/modulelist` grep \$\& `cat /tmp/modulelist` grep \$\` `cat /tmp/modulelist` to try and track down the offending module. You'll get quite a few false hits (comments, etc), but you might find an offending module. The main ones I found were: Parse::RecDescent Net::DNS and a couple of others I can't remember now. I fixed Net::DNS myself and sent a patch to the maintainer, but haven't heard anything. If you find this happens to be your problem as well, ask me for the patched version. Parse::RecDescent makes heavy use of the above vars, no chance of fixing that in a hurry. Rob - Original Message - From: "Paul Mineiro" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, January 24, 2002 11:01 AM Subject: Re: slow regex [BENCHMARK] > Paul Mineiro wrote: > > i've cleaned up the example to tighten the case: > > the mod perl code snippet is: > > --- > > my @cg; > > open DIL, '>', "/tmp/seqdata"; > print DIL $seq; > close DIL; > > warn "length seq = @{[length ($seq)]}"; > > my $t = timeit (1, sub { > while ($seq =~ /CG/g) > { > push @cg, pos ($seq); > } > }); > > print STDERR timestr ($t), "\n"; > > --- > > which yields > length seq = 21 at > /home/aerives/genegrokker-interface/mod_perl/genomic_img.pm line 634, > line 102 > 16 wallclock secs (15.56 usr + 0.01 sys = 15.57 CPU) @ 0.06/s (n=1) > > and the perl script (command line) version is: > > --- > > #!/usr/bin/perl > > use Benchmark; > use strict; > > open DIL, '<', "/tmp/seqdata"; > my $seq = ; > close DIL; > > warn "length seq is @{[length $seq]}"; > > my @cg; > > my $t = timeit (1, sub { > while ($seq =~ /CG/g) > { > push @cg, pos ($seq); > } >}); > > print STDERR timestr ($t), "\n"; > > --- > which yields: > > length seq is 21 at ./t.pl line 10. > 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) > > the data is pretty big, so i didn't attach it, but feel free to contact > me directly for it. > > -- p > > >hi. i'm running mod_perl 1.26 + apache 1.3.14 + perl 5.6.1 > > > >i have a loop in a mod_perl handler like so: > > > > my $stime = time (); > > > > while ($seq =~ /CG/og) > >{ > > push @cg, pos ($seq); > >} > > > > my $etime = time (); > > > > warn "time was: ", scalar localtime ($stime), " ", > >scalar localtime ($etime), " ", $etime - $stime; > > > > > >under mod_perl this takes 23 seconds. running the perl "by hand" (via > >extracting this piece into a seperate perl script) on the same data takes > >less than 1 second. > > > >has anyone seen this kind of extreme slowdown before? > > > >-- p > > > >info: > > > >apache build options: > > > >CFLAGS="-g -g -O3 -funroll-loops" \ > >LDFLAGS="-L/home/aerives/lib -L/home/aerives/lib/mysql" \ > >LIBS="-L/home/aerives/genegrokker-interface/lib > >-L/home/aerives/genegrokker-interface/ext/lib -L/home/aerives/lib > >-L/home/aerives/lib/mysql" \ > >./configure \ > >"--prefix=/home/aerives/genegrokker-interface/ext" \ > >"--enable-rule=EAPI" \ > >"--enable-module=most" \ > >"--enable-shared=max" \ > >"--with-layout=GNU" \ > >"--disable-rule=EXPAT" \ > >"$@" > > >
my $var at file scope and __DATA__ sections under mod_perl
I've had a little bit of a look, but can't find anything in the mod_perl guide about this. Basically it seems to me that 'my' variables at the package level don't retain their value under mod_perl. For instance, consider the following mod_perl handler. package My::Module; my $var; sub handler { warn($var || 'blah'); $var = 'test'; } Each time, the warn is for 'blah' because the value 'test' is never retained in $var. Is this intended behaviour? Personally I don't actually do this myself, but the module MIME::Types does, and it's causing it to break badly. Actually, it's a bit deeper than that. MIME::Types does this: my %list;sub new(@) { (bless {}, shift)->init( {@_} ) } sub init($){ my ($self, $args) = @_; unless(keys %list) { local $_; while() { s/\#.*//; next if m/^$/; ... What I'm finding is that it ends up running the loop everytime because (keys %list) == 0 always. Now in theory this should work, it would just be a performance annoyance. But it doesn't, if I change the code to { local $_; while() { s/\#.*//; warn($_); next if m/^$/; ... I end up seeing in my logs... [Mon Jan 14 13:47:01 2002] null: type: application/index.response[Mon Jan 14 13:47:01 2002] null: type: application/index.vnd[Mon Jan 14 13:47:01 2002] null: type: application/iotp[Mon Jan 14 13:47:01 2002] null: type: application/ipp[Mon Jan 14 13:47:01 2002] null: type: applicati at /usr/local/lib/perl5/site_perl/5.6.1/MIME/Types.pm line 75, line 30. Weird, it's like the handle just mysteriously ran out of data halfway through reading from it. Does anybody have any idea what's going on here. What's the best idea for fixing a module like this, change all: my $var; to usr vars qw($var); and submit a patch? Rob
CGI module bug, Internet Explorer 6 problems and workaround...
We've just recently started having problems with some people using Internet Explorer 6 to access our web-site. Basically they would receive an error message like: The XML page cannot be displayed Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later. A string literal was expected, but no opening quote character was found. Error processing resource 'http://www.fastmail.fm/mail/login/'. Line 4, Position 2 SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> After playing around for a while, I discovered that each web-page we were generating was XHTML, even though I had: use CGI qw(-no_xhtml); in our code. More searching showed that the first page generated by a process was correctly "HTML 4.01 Transitional", but every page after that was XHTML. Anyway, it turns out when you do something like: my $q = new CGI({}); In your code, it registers an Apache cleanup handler which clobbers all the global setup variables (eg $XHTML, $DEBUG, etc) after the first run. I've passed this on to Lincoln Stein who currently agrees with me that this is a bug. For the moment, I've added this to my code as a hack which works: $CGI::XHTML = 0; my $q = new CGI({}); Hope this helps any people who encounter the same problem. Though it of course raises the question, is the XHTML incorrect in someway to cause IE to barf, or is IE barfing incorrectly? Version information: CGI Version: 2.79Perl Version: This is perl, v5.6.1 built for i686-linuxWeb server: Apache 1.3.22 with mod_perl 1.26 Rob
Re: Comparison of different caching schemes
> The thing you were missing is that on an OS with an aggressively caching > filesystem (like Linux), frequently read files will end up cached in RAM > anyway. The kernel can usually do a better job of managing an efficient > cache than your program can. > > For what it's worth, DeWitt Clinton accompanied his first release of > File::Cache (the precursor to Cache::FileCache) with a benchmark showing > this same thing. That was the reason File::Cache was created. While that's true, there are still some problems with a file cache. Namely to get reasonably performance you have to do the directory hashing structure so that you don't end up with too many files (one for each key) in one directory. Thus for every add to the cache you have to: * stat each directory in the hash path and create it if it doesn't exist * open and create the file and write to it, close the file A similar method is required for reading. All that still takes a bit of time. This is where having some shared memory representation can be a really help since you don't have to traverse, open, read/write, close a file every time. Witness the performance of IPC::MM which seems to be mostly limited by the performance of the Storable module. I'm planning on doing another test which just stores some data without the 'streaming' to see which examples are really limited by Storable and which by their implementation, this might be useful for some people. > And ++ on Paul's comments about Devel::DProf and other profilers. Ditto again. I've been using Apache::DProf recently and it's been great at tracking down exactly where time is spent in my program. If you have any performance problems, definitely use it first before making any assumptions. One question though, I have a call tree that looks a bit like: main -> calls f1 -> calls f2 -> calls f3 -> calls f2 The two calls to f2 may take completely different times. Using 'dprofpp -I', I can see what percentage of overall time is spent in 'f1 and children', 'f3 and children' and 'f2 and children'. But I can't see an easy way to tell 'time in f2 and children when f2 was called from f1' (or ditto for f3). Does that make sense? Rob
Re: Comparison of different caching schemes
> In general the Cache::* modules were designed for clarity and ease of > use in mind. For example, the modules tend to require absolutely no > set-up work on the end user's part and try to be as fail-safe as > possible. Thus there is run-time overhead involved. That said, I'm > certainly not against performance. :) These benchmarks are going to > be tremendously useful in identifying bottlenecks. However, I won't > be able to optimize for these particular benchmarks, as Cache::Cache > is designed to do something different than straight gets and sets. > > Again, thank you, Rob. This is great, That's a good point. I probably should have added the features that each one can do to help with decisions. Cache::Cache does have the most options with regard to limiting time/size in the cache, so that could be a be factor in someones choice. > * Cache::Mmap (uses Storable) - Can indirectly specify the maximum cache size, though purges are uneven depending on how well data hashes into different buckets - Has callback ability on a read/purge so you can move any purged data to a different data store if you want, and automatically retrieve it on next retrieve when it's not in the cache > * Cache::FileCache (uses Storable) > * Cache::SharedMemoryCache (uses Storable) - Can specify the maximum cache size (Cache::SizeAwareFileCache) and/or maximum time an object is allowed in the cache - Follows the Cache::Cache interface system > * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert' > * DBI, use Storable, do 'select' then 'insert' or 'update' - Can't specifiy any limits directly - Could add a 'size' and 'timestamp' column to each row and use a daemon to iterate through and cleanup based on time and size > * MLDBM::Sync::SDBM_File (uses Storable) > * IPC::MM - Can't specifiy any limits directly - Could create secondary tied db/mm hash with key -> [ size, timestamp ] mapping and use daemon to iterate through and cleanup based on time and size Rob
Re: Comparison of different caching schemes
Some more points. I'd like to point out that I don't think the lack of actual concurrency testing is a real problem, at least for most single CPU installations. If most of the time is spent doing other stuff in a request (which is most likely the case), then on average when a process goes to access the cache, nothing else will be accessing it, so it won't have to block to wait to get a lock. In that case, one process doing 10*N accesses is the same as N processes doing 10 accesses, and the results are still meaningful. Perrin Harkins pointed out IPC::MM which I've added to the test code. It's based on the MM library (http://www.engelschall.com/sw/mm/mm-1.1.3.tar.gz) and includes a hash and btree tied hash implementation. I've tried the tied hash and it performs extremely well. It seems to be limited mostly by the speed of the Storable module. Package C0 - In process hashSets per sec = 181181Gets per sec = 138242Mixes per sec = 128501Package C1 - Storable freeze/thawSets per sec = 2856Gets per sec = 7079Mixes per sec = 3728Package C2 - Cache::MmapSets per sec = 810Gets per sec = 2956Mixes per sec = 1185Package C3 - Cache::FileCacheSets per sec = 392Gets per sec = 813Mixes per sec = 496Package C4 - DBI with freeze/thawSets per sec = 660Gets per sec = 1923Mixes per sec = 885Package C5 - DBI (use updates with dup) with freeze/thawSets per sec = 676Gets per sec = 1866Mixes per sec = 943Package C6 - MLDBM::Sync::SDBM_FileSets per sec = 340Gets per sec = 1425Mixes per sec = 510Package C7 - Cache::SharedMemoryCacheSets per sec = 31Gets per sec = 21Mixes per sec = 24Package C8 - IPC::MMSets per sec = 2267Gets per sec = 5435Mixes per sec = 2769 Rob
Re: Comparison of different caching schemes
Just wanted to add an extra thought that I forgot to include in the previous post. One important aspect missing from my tests is the actual concurrency testing. In most real world programs, multiple applications will be reading from/writing to the cache at the same time. Depending on the cache synchronisation scheme, you'll get varying levels of performance degradation: 1 being worst, 3 being the best. 1. Lock entire cache for a request (Cache::SharedMemoryCache, MySQL normal - table locks?) 2. Lock some part of cache for a request (Cache::Mmap buckets, MLDBM pages?) 3. Lock only the key/value for a request (Cache::FileCache, MySQL InnoDB - row locks?) Uggg, to do a complete test you really need to generate an entire modelling system: 1. Number of concurrent processes 2. Average reads/writes to cache per second 3. Ratio of reused/new entries etc Rob
Comparison of different caching schemes
Just thought people might be interested... I sat down the other day and wrote a test script to try out various caching implementations. The script is pretty basic at the moment, I just wanted to get an idea of the performance of different methods. The basic scenario is the common mod_perl situation: * Multiple processes * You need to share perl structures between processes * You want to index the data structure on some key * Same key is written and read multiple times I tried out the following systems. * Null reference case (just store in 'in process' hash) * Storable reference case (just store in 'in process' hash after 'freeze') * Cache::Mmap (uses Storable) * Cache::FileCache (uses Storable) * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert' * DBI, use Storable, do 'select' then 'insert' or 'update' * MLDBM::Sync::SDBM_File (uses Storable) * Cache::SharedMemoryCache (uses Storable) For systems like Cache::* which can automatically delete items after an amount of time or size, I left these options off or made the cache big enough that this wouldn't happen. I've included the script for people to try out if they like and add other test cases. Now to the results, here they are. Package C0 - In process hashSets per sec = 147116Gets per sec = 81597Mixes per sec = 124120Package C1 - Storable freeze/thawSets per sec = 2665Gets per sec = 6653Mixes per sec = 3880Package C2 - Cache::MmapSets per sec = 809Gets per sec = 3235Mixes per sec = 1261Package C3 - Cache::FileCacheSets per sec = 393Gets per sec = 831Mixes per sec = 401Package C4 - DBI with freeze/thawSets per sec = 651Gets per sec = 1648Mixes per sec = 816Package C5 - DBI (use updates with dup) with freeze/thawSets per sec = 657Gets per sec = 1994Mixes per sec = 944Package C6 - MLDBM::Sync::SDBM_FileSets per sec = 334Gets per sec = 1279Mixes per sec = 524Package C7 - Cache::SharedMemoryCacheSets per sec = 42Gets per sec = 29Mixes per sec = 32 Notes: * System = Pentium III 866, Linux 2.4.16-0.6, Ext3 (no special file filesystem flags), MySQL (with InnoDB tables) * Null reference hash is slower reading because it does a very basic check to see that the retrieved hash has the same number of keys as the stored hash * Approximate performance order (best to worst) = Cache::Mmap, DBI, MLDBM::Sync::SDBM_File, Cache::FileCache, Cache::SharedMemoryCache * Remember what Knuth said, "Premature optimisation is the root of all evil." This data won't help you if something else in your application is the bottleneck... * The code is available at: http://fastmail.fm/users/robm/perl/cacheperltest.pl Have I missed something obvious? Rob . <- Grain of salt to be taken with this post