SOAP::Lite and libapreq...

2003-03-17 Thread Rob Mueller
I'm having a problem using a SOAP::Lite mod_perl handler, and I can't seem
to see what I'm missing.

Basically I've setup a section as such:

Location /soap
  SetHandler perl-script
  PerlHandler SOAP::Handler
/Location

And the module SOAP::Handler as such:

use strict;
use SOAP::Transport::HTTP;

my $server = SOAP::Transport::HTTP::Apache
  - dispatch_to('SOAP::Services');

sub handler { $server-handler(@_); }

And then I've got all my calls in SOAP::Services. Now, it's actually working
properly, but every request I get, I see this in the error log.

[Tue Mar 18 17:24:10 2003] [error] [client 127.0.0.1] [libapreq] unknown
content-type: `text/xml; charset=utf-8'

Having a look in libapreq I find:

if (r-method_number == M_POST) {
const char *ct = ap_table_get(r-headers_in, Content-type);
if (ct  strncaseEQ(ct, DEFAULT_ENCTYPE, DEFAULT_ENCTYPE_LENGTH)) {
result = ApacheRequest_parse_urlencoded(req);
}
else if (ct  strncaseEQ(ct, MULTIPART_ENCTYPE,
MULTIPART_ENCTYPE_LENGTH)) {
   result = ApacheRequest_parse_multipart(req);
}
else {
ap_log_rerror(REQ_ERROR,
  [libapreq] unknown content-type: `%s', ct);
result = HTTP_INTERNAL_SERVER_ERROR;
}
}
else {

and

c/apache_request.h:#define DEFAULT_ENCTYPE
application/x-www-form-urlencoded
c/apache_request.h:#define DEFAULT_ENCTYPE_LENGTH 33

c/apache_request.h:#define MULTIPART_ENCTYPE multipart/form-data
c/apache_request.h:#define MULTIPART_ENCTYPE_LENGTH 19

Creating a SOAP::Lite client and setting +trace = 'all', I see:

Accept: text/xml
Accept: multipart/*
Content-Length: 530
Content-Type: text/xml; charset=utf-8
SOAPAction: http://localhost/MEServices#CreateSession;

?xml version=1.0 encoding=UTF-8?SOAP-ENV:Envelope
xmlns:xsi=http://www.w3.org/1999/XMLSchema-instance;
xmlns:SOAP-ENC=http://schemas.xmlsoap.org/soap/encoding/;
xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/
xmlns:xsd=http://www.w3.org/1999/XMLSchema;
SOAP-ENV:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/;

So SOAP::Lite is setting the Content-Type to text/xml; charset=utf-8, but
libapreq only accepts application/x-www-form-urlencoded or
multipart/form-data or POST requests.

Strangely though, the request does actually work, and the SOAP method does
get called correctly. I haven't looked further into the code to see why this
is.

Anyway, I can't actually see how a mod_perl SOAP handler can work without
getting this error message very time in the log, and I can't believe that
no-one else has come across this before, and that I must be missing
something very obvious...

Can anyone help?

Rob



Apache::DB and perl 5.80

2002-11-28 Thread Rob Mueller
I've noticed a few comments around the web of problems with 5.8.0 and
Apache::DB, but no responses that anyone is looking at it or has a solution.

~www/bin/httpd -X -Dperldb
[notice] Apache::DB initialized in child 2076
[Thu Nov 28 03:24:44 2002] [error] No DB::DB routine defined at
/usr/local/lib/perl5/5.8.0/i686-linux/lib.pm line 10.
Compilation failed in require at conf/startup.pl line 21.
BEGIN failed--compilation aborted at conf/startup.pl line 21.
Compilation failed in require at (eval 6) line 1.

Does anyone know is anyone is looking into this or if there's a solution
floating around?

Rob




CGI parameters appear to be doubled on 8 bit chars...

2002-10-13 Thread Rob Mueller

Just wondering if anyone has seen this problem before, or has a general
solution to it. Basically what we see, is that with some submitted forms,
usually with 8 bit data, the POST parameters passed become 'doubled'. The
problem is that we have a loop like this to gather out all the parameters
early on in our handler code.

  foreach my $Key ($R-param) {
my ($UKey, UParam) = ($Key, $R-param($Key));

$CGIState{$UKey} = scalar(UParam)  1 ? \@UParam : $UParam[0];
  }

The result is that we end up with an array reference for every parameter,
instead of a scalar value. And we can't just always take the first value,
because multi-select list boxes also return array values, and we don't know
at this stage in the code what type of form element each param comes from.

In general, the values are the same, except where there is 8 bit data, in
which case the 2 versions are different. Here's an example of what we see:

$VAR1 = {
   'LastScreen' = [
 '/MR-@0,324,',
 '/MR-@0,324,'
   ],
   'Subject' = [
 'Blah blah blah',
 'Blah blah blah'
   ],
   'Message' = [
 '#8216;blah blah#8217; blah blah #8216;blah blah#8217; blah.',
 '91blah blah92 blah blah 91blah blah92blah.'
   ]
}

(The 91 etc are highlighted when using 'less', so I presume that probably
means it's hex code 0x91)

So it seems somewhere the 8 bit data is coming as both HTML entity versions,
and the raw 8 bit data version. I'm not sure if this is IE or mod_perl doing
this, though I'm guessing it's IE.

So in general, my questions are:
1. Have people seen this before, and how do you generally deal with it?
2. Actually how do you handle in general 8 bit data? How do you know which
charset it's coming as?
3. Is there any documentation anywhere on why this is happening? Who is
sending the two versions? How to detect it?

Any help or pointers on dealing with these issues would be appreciated.

Thanks

Rob




Re: POST problems

2002-09-20 Thread Rob Mueller

I've seen similar weird things happening with some of our users. It's always
IE 5 or IE 5.5 users it seems. It also seems to start 'randomly'. We'll get
emails saying Everything was working great last week, now whenever I click
a button on your site nothing happens. Are far as I can tell, the form is
submitted, but there are no fields in the submitted data. I haven't been
able to reproduce it reliably yet, or work out if only some fields are sent,
or none at all. Has anyone else heard of something like this?

Rob

- Original Message -
From: Corey Durward [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, September 20, 2002 8:52 AM
Subject: POST problems


 Hi all.

 Having a curious problem with the POST command with all CGI scripts on a
 particular server (Apache/1.3.14 (Unix) (Red-Hat/Linux) DAV/1.0.2
PHP/4.0.6
 mod_perl/1.24). Basically, it doesn't work. Form inputs are treated as
 though the inputs were blank. It doesn't appear to be a permissions error
as
 no error reports are issued and nothing appears in the error log. GET
still
 works.

 Everything was working fine up until a few weeks ago. The server admin
 claims nothing has been modified in that time (I'm skeptical) and is
 clueless as to the cause. I've scoured httpd.conf and haven't found
anything
 obvious there that might cause this.

 Fishing for a clue. Any suggestions appreciated.

 Corey.










Re: Persistent Net::Telnet Objects

2002-05-29 Thread Rob Mueller (fastmail)

Our project needed persistent socket connections open as well. There is
supposed to be a standard mechanism to pass file descriptors between unix
processes, though it's bugginess level depends on your OS. There is a perl
module for this called Socket::PassAccessRights. So what you can do is
create a daemon process that just hangs round holding socket connections
open, like a socket cache basically, and passing them back and forth between
Apache processes based on some session ID or user ID or the like.

Your daemon ends up looking something like this (with lots more error
checking of course)

my %sockmap;
while (1) {
  my $clientsock = $listen-accept();
  chomp(my $sessionid = $clientsock);
  my $cachesock = ($sockmap{$sessionid} ||= opennewsock());
  Socket::PassAccessRights::sendfd(fileno($clientsock), fileno($cachesock));
  $clientsock-close();
}

And in your mod_perl code you do something like:

  my $serversock = IO::Socket::INET-new(Server = 'localhost', Port =
SOCKETPOOLPORT);
  print $serversock $sessionid, \n;
  my $Fd = Socket::PassAccessRights::recvfd(fileno($serversock));
  open(my $realsocket, =$Fd);
  fcntl($realsocket, F_SETFD, 0);
  my $ofh = select($realsocket); $| = 1; select ($ofh);

If you do some experimenting, you'll get something that works, you'll also
find lots of cases that don't.

Rob

- Original Message -
From: French, Shawn [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, May 30, 2002 3:53 AM
Subject: Persistent Net::Telnet Objects


 Vitals:
 Apache/1.3.20 (Win32) mod_perl/1.25_01-dev mod_ssl/2.8.4 OpenSSL/0.9.6a on
 Windows 2000 with PHP 4.21

 I am working on a project that requires me to have two telnet objects per
 user session opened, and accessible throughout the user's session. I have
 looked at Apache::Session and many other solutions but my problem is that
to
 keep a Net::Telnet object, I need to keep open sockets and filehandles, so
I
 cannot serialize the object and store it in a database or file.

 Currently I have similar code working flawlessly:
 ###
 # startup.pl - called when apache starts (ie. PerlRequire
 d:/Apache/conf/startup.pl)
 ##
 use MySite::Session;

 ###
 # Session.pm
 ##
 @EXPORT = qw( %sessionHash );
 our %sessionHash;

 ###
 # init_session.pl - called IN MOD_PERL when a new session is requested
 ##
 use MySite::Session;
 $sessionHash{$session_id . _telnetObj} = Net::Telnet-new();

 ###
 # dostuff.pl - called IN MOD_PERL many time throughout the session
 ##
 use MySite::Session;
 my telnetObj = $sessionHash{$session_id . _telnetObj};
 bless (\$telnetObj, Net::Telnet);

 Although this is working right now, I don't know enough [ anything? :) ]
 about Apache or mod_perl to be sure that this will work in the future.
What
 I am really concerned about is that the telnetObj will only be accessible
 from scripts run by the same child process as that which created and saved
 it.

 Is there a better way to do this?

 Thanks,
 Shawn French






Apache::Reload question...

2002-05-03 Thread Rob Mueller (fastmail)



I've got a "reality check" question for 
people to see that I'm not missing something obvious with our Apache::Reload 
mod_perl setup.

We've recently install Apache::Reload at 
our site in production and it's working great. In what isprobably not the 
best 'software engineering' style, we've been known to upload several small 
patches in a single day and used to have to do short restarts to integrate the 
new code. We now use Apache::Reload instead. Rather than putting 'use Apache::Reload' 
in each of our modules, I've created a touch file, which after looking through 
the Apache::Reload code, I noted that you could put a list of modules into it 
which would be reloaded.

On top of this, weuse 
mod_accelas a front end to our mod_perl backend.This combination 
seems to work great as well for anyone curious.

The question I had regards where to put the 
'Apache::Reload' directive. The documentation suggests something 
like:

 PerlInitHandler 
Apache::Reload PerlSetVar ReloadAll Off
 PerlSetVar ReloadTouchFile 
/tmp/reload_modules

The problem I see in a production machine 
is that each child process will see this on the next request, and attempt to 
reload it's modules. At that point, you'll loose the shared memory the modules 
use between child processes.

On top of this, the parent process will 
never get this, so it will never reload modules in the parent.The next 
time a new child is forked, on the first request it receives it will again 
attempt to reload the changed modules. Is this correct? Or am I missing 
something?

The alternative I've used is 
this:

 PerlRestartHandler 
Apache::Reload PerlSetVar ReloadAll Off PerlSetVar 
ReloadTouchFile /tmp/reload_modules

Then when I've uploaded any changes, I 
touch the change file, and do an 'apachectl graceful' to restart the backend. I 
think this works nicely because:

1)The mod_accel front end will buffer 
any long file uploads, and any long file downloads. So the actual length of 
connections from the frontend to the backend is only as long as it takes to 
process the request and tunnel the data betwen the front-back or 
back-front. Thus the 'graceful' restart only ever takes a few seconds, and 
no connections are ever lost, only blocked for a few seconds at the most (the 
length of the longest request to process).
2) Doing it in the restart handler means 
that the parent process reloads the modules, and all the newly forked children 
have shared copies.

Can anyone tell me if I'm missing something 
here?

Rob



libapreq problem and mozilla 0.97

2002-02-05 Thread Rob Mueller (fastmail)

Just wondering if anyone has encountered this before and if it's been fixed
in libapreq for the upcoming release.

Basically, whenever I try and use Mozilla 0.97 with a file upload field on a
form and don't select any file in the field, libapreq seems to hang on the
$R-parse() call. Mozilla 0.98 seems to work fine, but 0.97 doesn't. While
it's easy enough to just say upgrade, it's still annoying that it hangs a
process for a while until our alarm goes off.

A couple of things I've noticed, the Mozilla 0.97 file fields might be a bit
broken. The raw POST request data is:

... stuff deleted ...
-5965166491649760492719885386
Content-Disposition: form-data; name=FMC-UploadFile1; filename=
Content-Type: application/octet-stream

-5965166491649760492719885386
... more stuff deleted ...

While under Mozilla 0.98, which doesn't hang libapreq, the request data is:

... stuff deleted ...
-20448977631102520059783368690
Content-Disposition: form-data; name=FMC-UploadFile1; filename=
Content-Type: application/octet-stream


-20448977631102520059783368690
... more stuff deleted ...

Note the extra blank line, which I think the lack of is causing the problem
under 0.97.

I did an strace under 0.97 and got:

read(4, POST /mail/~354ad16bd30a20352/ H..., 4096) = 2621
rt_sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}, 8) = 0
time(NULL)  = 1012943782
alarm(60)   = 60
alarm(0)= 60
rt_sigaction(SIGALRM, NULL, {0x80ee530, [], SA_INTERRUPT|0x400}, 8) = 0
dup2(15, 2) = 2
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
brk(0x9574000)  = 0x9574000
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x81c7c1c, [], SA_RESTART|0x400}, {0x80ee530, [],
SA_INTERRUPT|0x400}, 8) = 0
alarm(60)   = 0
brk(0x9575000)  = 0x9575000
brk(0x9576000)  = 0x9576000
alarm(60)   = 60
read(4,

So, it seems to be hanging because it's trying to read more data when there
isn't any. If I do basically the same request under IE I get:

read(4, POST /mail/~354ad16bd30a20352/ H..., 4096) = 2536
rt_sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}, 8) = 0
time(NULL)  = 1012944362
alarm(60)   = 60
alarm(0)= 60
rt_sigaction(SIGALRM, NULL, {0x80ee530, [], SA_RESTART|0x400}, 8) = 0
dup2(15, 2) = 2
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGALRM, {0x81c7c1c, [], SA_RESTART|0x400}, {0x80ee530, [],
SA_RESTART|0x400}, 8) = 0
alarm(60)   = 0
alarm(60)   = 60

and it keeps going and works fine. Anyone know what might be happening? How
to fix it?

Rob





Odd mod_perl and LimitRequestBody problem

2002-02-03 Thread Rob Mueller (fastmail)

We just experienced an odd problem and were wondering if anyone has
encountered this before. We recently set the apache LimitRequestBody
parameter to 1000 (10M) and all was working fine until a recent restart.
We started getting errors in the logs whenever there was a file upload field
in the form, even if no file was selected.

[Sun Feb  3 21:01:23 2002] [error] [client 127.0.0.1] [libapreq] could not
create/open temp file

No other changes had been made, and these started occuring immediately after
we did a apachectl stop/start cycle. We restarted the server 3 times, each
time the same problem occured. On the 4th restart, everything started
working fine again, even though no other changes had been made.

Has anyone ever had a similar experience to this? It's just a bit
disturbing...

Apache: 1.3.22
mod_perl: 1.26
libapreq: 0.33

Rob





Re: Solved - Odd mod_perl and LimitRequestBody problem

2002-02-03 Thread Rob Mueller (fastmail)

My fault, better be more careful when playing with environment vars.
libapreq uses tempnam(3), which uses the environment var TMPDIR when
creating temporary file names. Two of us were working on the server, one had
it set, the other didn't. That's why one person restarting the server
continued to have the problems, while when the other person restarted the
server, it suddenly fixed itself.

Rob

- Original Message -
From: Rob Mueller (fastmail) [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, February 04, 2002 4:19 PM
Subject: Odd mod_perl and LimitRequestBody problem


 We just experienced an odd problem and were wondering if anyone has
 encountered this before. We recently set the apache LimitRequestBody
 parameter to 1000 (10M) and all was working fine until a recent
restart.
 We started getting errors in the logs whenever there was a file upload
field
 in the form, even if no file was selected.

 [Sun Feb  3 21:01:23 2002] [error] [client 127.0.0.1] [libapreq] could not
 create/open temp file

 No other changes had been made, and these started occuring immediately
after
 we did a apachectl stop/start cycle. We restarted the server 3 times, each
 time the same problem occured. On the 4th restart, everything started
 working fine again, even though no other changes had been made.

 Has anyone ever had a similar experience to this? It's just a bit
 disturbing...

 Apache: 1.3.22
 mod_perl: 1.26
 libapreq: 0.33

 Rob







Re: slow regex [BENCHMARK]

2002-01-24 Thread Rob Mueller (fastmail)

I recently had a similar problem. A regex that worked fine in sample code
was a dog in the web-server code. It only happened with really long strings.
I tracked down the problem to this from the 'perlre' manpage.

   WARNING: Once Perl sees that you need one of $, $`, or $'
anywhere in the program, it
   has to provide them for every pattern match.  This may substantially
slow your program.  Perl
   uses the same mechanism to produce $1, $2, etc, so you also pay a
price for each pattern that
   contains capturing parentheses.  (To avoid this cost while retaining
the grouping behaviour,
   use the extended regular expression (?: ... ) instead.)  But if you
never use $, $` or
   $', then patterns without capturing parentheses will not be
penalized.  So avoid $,
   $', and $` if you can, but if you can't (and some algorithms
really appreciate them),
   once you've used them once, use them at will, because you've already
paid the price.  As of
   5.005, $ is not so costly as the other two.

Basically one of the modules in the web-app I was 'use'ing needed $', but my
test code didn't 'use' that module. The result was pretty dramatic in this
case, something that took approx 1 second in the test code was timing out
after 2 minutes in the web-server.

What I did in the end was something like this:

In the code somewhere add this so it's run when a request hits.

open(F, '/tmp/modulelist');
print F join(\n, values %INC), \n;
close(F);

This creates a file which lists all the loaded modules. Then after sticking
a request through the browser, do something like:

grep \$\' `cat /tmp/modulelist`
grep \$\ `cat /tmp/modulelist`
grep \$\` `cat /tmp/modulelist`

to try and track down the offending module. You'll get quite a few false
hits (comments, etc), but you might find an offending module. The main ones
I found were:

Parse::RecDescent
Net::DNS

and a couple of others I can't remember now. I fixed Net::DNS myself and
sent a patch to the maintainer, but haven't heard anything. If you find this
happens to be your problem as well, ask me for the patched version.
Parse::RecDescent makes heavy use of the above vars, no chance of fixing
that in a hurry.

Rob

- Original Message -
From: Paul Mineiro [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, January 24, 2002 11:01 AM
Subject: Re: slow regex [BENCHMARK]


 Paul Mineiro wrote:

 i've cleaned up the example to tighten the case:

 the mod perl code  snippet is:

 ---

   my @cg;

   open DIL, '', /tmp/seqdata;
   print DIL $seq;
   close DIL;

   warn length seq = @{[length ($seq)]};

   my $t = timeit (1, sub {
 while ($seq =~ /CG/g)
   {
 push @cg, pos ($seq);
   }
  });

   print STDERR timestr ($t), \n;

 ---

 which yields
 length seq = 21 at
 /home/aerives/genegrokker-interface/mod_perl/genomic_img.pm line 634,
 GEN1 line 102
 16 wallclock secs (15.56 usr +  0.01 sys = 15.57 CPU) @  0.06/s (n=1)

 and the perl script (command line) version is:

 ---

 #!/usr/bin/perl

 use Benchmark;
 use strict;

 open DIL, '', /tmp/seqdata;
 my $seq = DIL;
 close DIL;

 warn length seq is @{[length $seq]};

 my @cg;

 my $t = timeit (1, sub {
   while ($seq =~ /CG/g)
 {
   push @cg, pos ($seq);
 }
});

 print STDERR timestr ($t), \n;

 ---
 which yields:

 length seq is 21 at ./t.pl line 10.
  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)

 the data is pretty big, so i didn't attach it, but feel free to contact
 me directly for it.

 -- p

 hi.  i'm running mod_perl 1.26 + apache 1.3.14 + perl 5.6.1
 
 i have a loop in a mod_perl handler like so:
 
   my $stime = time ();
 
   while ($seq =~ /CG/og)
 {
   push @cg,  pos ($seq);
 }
 
   my $etime = time ();
 
   warn time was: , scalar localtime ($stime),  ,
 scalar localtime ($etime),  , $etime - $stime;
 
 
 under mod_perl this takes 23 seconds.  running the perl by hand (via
 extracting this piece into a seperate perl script) on the same data takes
 less than 1 second.
 
 has anyone seen this kind of extreme slowdown before?
 
 -- p
 
 info:
 
 apache build options:
 
 CFLAGS=-g -g -O3 -funroll-loops \
 LDFLAGS=-L/home/aerives/lib -L/home/aerives/lib/mysql \
 LIBS=-L/home/aerives/genegrokker-interface/lib
 -L/home/aerives/genegrokker-interface/ext/lib -L/home/aerives/lib
 -L/home/aerives/lib/mysql \
 ./configure \
 --prefix=/home/aerives/genegrokker-interface/ext \
 --enable-rule=EAPI \
 --enable-module=most \
 --enable-shared=max \
 --with-layout=GNU \
 --disable-rule=EXPAT \
 $@
 
 mod_perl build options:
 
 configure_options=PERL_USELARGEFILES=0 USE_APXS=1
 WITH_APXS=$PLAYPEN_ROOT/ext/sbin/apxs EVERYTHING=1
 INC=$PLAYPEN_ROOT/ext/include -DEAPI
 
 perl -V:
 Summary of my perl5 (revision 5.0 version 6 

my $var at file scope and __DATA__ sections under mod_perl

2002-01-13 Thread Rob Mueller (fastmail)



I've had a little bit of a look, but can't 
find anything in the mod_perl guide about this. Basically it seems to me that 
'my' variables at the package level don't retain their value under 
mod_perl.

For instance, consider the following 
mod_perl handler.

package My::Module;
my $var;

sub handler {
 warn($var || 
'blah');
 $var = 
'test';
}

Each time, the warn is for 'blah' because 
the value 'test' is never retained in $var. Is this intended behaviour? 
Personally I don't actually do this myself, but the module MIME::Types does, and 
it's causing it to break badly.

Actually, it's a bit deeper than that. 
MIME::Types does this:

my %list;sub new(@) { (bless {}, 
shift)-init( {@_} ) }

sub init($){ my ($self, 
$args) = @_; 
unless(keys %list) { local 
$_; 
while(MIME::Types::DATA)
 
{ s/\#.*//;
 next 
if m/^$/;
...

What I'm finding is that it ends up running 
the loop everytime because (keys %list) == 0 always. Now in theory this should 
work, it would just be a performance annoyance. But it doesn't, if I change the 
code to

 { local 
$_; 
while(MIME::Types::DATA)
 
{ s/\#.*//;
  
 warn($_);
 next 
if m/^$/;
...

I end up seeing in my logs...

[Mon Jan 14 13:47:01 2002] null: type: 
application/index.response[Mon Jan 14 13:47:01 2002] null: type: 
application/index.vnd[Mon Jan 14 13:47:01 2002] null: type: 
application/iotp[Mon Jan 14 13:47:01 2002] null: type: 
application/ipp[Mon Jan 14 13:47:01 2002] null: type: applicati at 
/usr/local/lib/perl5/site_perl/5.6.1/MIME/Types.pm line 75, DATA line 
30.

Weird, it's like the 
MIME::Types::DATA handle just mysteriously ran out of data halfway 
through reading from it. Does anybody have any idea what's going on here. What's 
the best idea for fixing a module like this, change all:

my $var;

to

usr vars qw($var);

and submit a patch?

Rob



Re: Comparison of different caching schemes

2001-12-14 Thread Rob Mueller (fastmail)

 The thing you were missing is that on an OS with an aggressively caching
 filesystem (like Linux), frequently read files will end up cached in RAM
 anyway.  The kernel can usually do a better job of managing an efficient
 cache than your program can.

 For what it's worth, DeWitt Clinton accompanied his first release of
 File::Cache (the precursor to Cache::FileCache) with a benchmark showing
 this same thing.  That was the reason File::Cache was created.

While that's true, there are still some problems with a file cache. Namely
to get reasonably performance you have to do the directory hashing structure
so that you don't end up with too many files (one for each key) in one
directory. Thus for every add to the cache you have to:
* stat each directory in the hash path and create it if it doesn't exist
* open and create the file and write to it, close the file

A similar method is required for reading. All that still takes a bit of
time. This is where having some shared memory representation can be a really
help since you don't have to traverse, open, read/write, close a file every
time. Witness the performance of IPC::MM which seems to be mostly limited by
the performance of the Storable module. I'm planning on doing another test
which just stores some data without the 'streaming' to see which examples
are really limited by Storable and which by their implementation, this might
be useful for some people.

 And ++ on Paul's comments about Devel::DProf and other profilers.

Ditto again. I've been using Apache::DProf recently and it's been great at
tracking down exactly where time is spent in my program. If you have any
performance problems, definitely use it first before making any assumptions.

One question though, I have a call tree that looks a bit like:
main
- calls f1
   - calls f2
- calls f3
   - calls f2

The two calls to f2 may take completely different times. Using 'dprofpp -I',
I can see what percentage of overall time is spent in 'f1 and children', 'f3
and children' and 'f2 and children'. But I can't see an easy way to tell
'time in f2 and children when f2 was called from f1' (or ditto for f3). Does
that make sense?

Rob





Re: Comparison of different caching schemes

2001-12-12 Thread Rob Mueller (fastmail)



Some more points.

I'd like to point out 
that I don't think the lack of actual concurrency testing is a real problem, at 
leastfor most single CPU installations. If most of the time is spent doing other stuff in a request (which 
is most likely the case), then on average when a process goes to access the 
cache, nothing else will be accessing it, so it won't have to block to wait to 
get a lock. In that case, one process doing 10*N accesses is the same as N 
processes doing 10 accesses, and the results are still 
meaningful.

Perrin Harkins pointed out IPC::MM 
which I've added to the test code. It's based on the MM library (http://www.engelschall.com/sw/mm/mm-1.1.3.tar.gz) 
and includes a hash and btree tied hash implementation. I've tried the tied hash 
and it performs extremely well. It seems to be limited mostly by the speed of 
the Storable module.

Package C0 - In process hashSets per 
sec = 181181Gets per sec = 138242Mixes per sec = 128501Package C1 - 
Storable freeze/thawSets per sec = 2856Gets per sec = 7079Mixes per 
sec = 3728Package C2 - Cache::MmapSets per sec = 810Gets per sec = 
2956Mixes per sec = 1185Package C3 - Cache::FileCacheSets per sec = 
392Gets per sec = 813Mixes per sec = 496Package C4 - DBI with 
freeze/thawSets per sec = 660Gets per sec = 1923Mixes per sec = 
885Package C5 - DBI (use updates with dup) with freeze/thawSets per sec 
= 676Gets per sec = 1866Mixes per sec = 943Package C6 - 
MLDBM::Sync::SDBM_FileSets per sec = 340Gets per sec = 1425Mixes per 
sec = 510Package C7 - Cache::SharedMemoryCacheSets per sec = 31Gets 
per sec = 21Mixes per sec = 24Package C8 - IPC::MMSets per sec = 
2267Gets per sec = 5435Mixes per sec = 2769

Rob




Re: Comparison of different caching schemes

2001-12-12 Thread Rob Mueller (fastmail)

 In general the Cache::* modules were designed for clarity and ease of
 use in mind.  For example, the modules tend to require absolutely no
 set-up work on the end user's part and try to be as fail-safe as
 possible.  Thus there is run-time overhead involved.  That said, I'm
 certainly not against performance.  :) These benchmarks are going to
 be tremendously useful in identifying bottlenecks.  However, I won't
 be able to optimize for these particular benchmarks, as Cache::Cache
 is designed to do something different than straight gets and sets.

 Again, thank you, Rob.  This is great,

That's a good point. I probably should have added the features that each one
can do to help with decisions. Cache::Cache does have the most options with
regard to limiting time/size in the cache, so that could be a be factor in
someones choice.

 * Cache::Mmap (uses Storable)
- Can indirectly specify the maximum cache size, though purges are uneven
depending on how well data hashes into different buckets
- Has callback ability on a read/purge so you can move any purged data to a
different data store if you want, and automatically retrieve it on next
retrieve when it's not in the cache

 * Cache::FileCache (uses Storable)
 * Cache::SharedMemoryCache (uses Storable)
- Can specify the maximum cache size (Cache::SizeAwareFileCache) and/or
maximum time an object is allowed in the cache
- Follows the Cache::Cache interface system

 * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert'
 * DBI, use Storable, do 'select' then 'insert' or 'update'
- Can't specifiy any limits directly
- Could add a 'size' and 'timestamp' column to each row and use a daemon to
iterate through and cleanup based on time and size

 * MLDBM::Sync::SDBM_File (uses Storable)

 * IPC::MM
- Can't specifiy any limits directly
- Could create secondary tied db/mm hash with key - [ size, timestamp ]
mapping and use daemon to iterate through and cleanup based on time and size

Rob





Comparison of different caching schemes

2001-12-11 Thread Rob Mueller (fastmail)



Just thought people might be 
interested...

I sat down the other day and wrote a test 
script to try out various caching implementations. The script is pretty basic at 
the moment, I just wanted to get an idea of the performance of different 
methods.

The basic scenario is the common mod_perl 
situation:
* Multiple processes
* You need to share perl structures between 
processes
* You want to index the data structure on 
some key
* Same key is written and read multiple 
times

I tried out the following 
systems.
* Null reference case (just store in 'in 
process' hash)
* Storable reference case (just store in 
'in process' hash after 'freeze')
* Cache::Mmap (uses Storable)
* Cache::FileCache (uses 
Storable)
* DBI (I used InnoDB), use Storable, always 
do 'delete' then 'insert'
* DBI, use Storable, do 'select' then 
'insert' or 'update'
* MLDBM::Sync::SDBM_File (uses 
Storable)
* Cache::SharedMemoryCache (uses 
Storable)

For systems like Cache::* which can 
automatically delete items after an amount of time or size, I left these options 
off or made the cache big enough that this wouldn't happen.

I've included the script for people to try 
out if they like and add other test cases.

Now to the results, here they 
are.
Package C0 - In process hashSets per sec = 147116Gets per sec = 
81597Mixes per sec = 124120Package C1 - Storable freeze/thawSets per 
sec = 2665Gets per sec = 6653Mixes per sec = 3880Package C2 - 
Cache::MmapSets per sec = 809Gets per sec = 3235Mixes per sec = 
1261Package C3 - Cache::FileCacheSets per sec = 393Gets per sec = 
831Mixes per sec = 401Package C4 - DBI with freeze/thawSets per sec 
= 651Gets per sec = 1648Mixes per sec = 816Package C5 - DBI (use 
updates with dup) with freeze/thawSets per sec = 657Gets per sec = 
1994Mixes per sec = 944Package C6 - MLDBM::Sync::SDBM_FileSets per 
sec = 334Gets per sec = 1279Mixes per sec = 524Package C7 - 
Cache::SharedMemoryCacheSets per sec = 42Gets per sec = 29Mixes per 
sec = 32

Notes:
* System =Pentium III 866, Linux 
2.4.16-0.6, Ext3 (no special file filesystem flags), MySQL (with InnoDB 
tables)
* Null reference hash is slower reading 
because it does a very basic check to see that the retrieved hash has the same 
number of keys as the stored hash
* Approximate performance order (best to 
worst) = Cache::Mmap, DBI, MLDBM::Sync::SDBM_File, Cache::FileCache, 
Cache::SharedMemoryCache
* Rememberwhat Knuth said, "Premature 
optimisation is the root of all evil." This data won't help you if something 
else in your application is the bottleneck...
* The code is available at: http://fastmail.fm/users/robm/perl/cacheperltest.pl

Have I missed something 
obvious?

Rob
. - Grain of salt to be taken with this 
post




Re: Comparison of different caching schemes

2001-12-11 Thread Rob Mueller (fastmail)



Just wanted to add an extra thought that I 
forgot to include in the previous post.

One important aspect missing from my tests 
is the actual concurrency testing. In mostreal world programs, multiple 
applications will be reading from/writing to the cache at the same time. 
Depending on the cache synchronisation scheme, you'll get varying levels of 
performance degradation:


1 being worst, 3 being the 
best.

1. Lock entire cache for a 
request(Cache::SharedMemoryCache, MySQL normal - table 
locks?)
2. Lock some part of cache for a request 
(Cache::Mmap buckets, MLDBM pages?)
3. Lock only the key/value for a request 
(Cache::FileCache, MySQL InnoDB - row locks?)

Uggg, to do a complete test you really need 
to generate an entire modelling system:
1. Number of concurrent 
processes
2. Average reads/writes to cache per 
second
3. Ratio of reused/new entries
etc

Rob