Re: ApacheDBI question

2000-01-29 Thread Mark Cogan

At 05:40 PM 1/28/00 -0500, Deepak Gupta wrote:
How does connection pooling determine how many connections to keep open?

The reason I ask is that I am afraid my non-modperl scripts are getting
rejected by the db server b/c all (or most) connections are being
dedicated to Apache activity.

Apache::DBI keeps one connection open per process per unique connection 
string. If you have 175 modperl processes running, be prepared to cope with 
as many as 175 database connections.

The source code for Apache::DBI is worth a look -- it's very short and easy 
to understand, and then you'll know all there is to know about how it works.
---
Mark Cogan[EMAIL PROTECTED] +1-520-881-8101 
ArtToday  www.arttoday.com



Re: splitting mod_perl and sql over machines

2000-01-29 Thread Jeffrey W. Baker

Marko van der Puil wrote:
 
 so httpd 1 has just queried the database and httpd 2 is just executing...
 It also has to query the database, so it has to wait, for httpd 1 to finish. (not
 actually how it works but close enough)
 Now httpd 1 has the results from the query and is preparing to read the template
 from disk.
 httpd 2 is now quering the database... Now httpd 1 has to wait for the httpd 2
 query to finish, before it can fetch it's template from disk. a.s.o. a.s.o. This,
 unfortunately is (still) how pc's work. There's no such thing as paralel processing
 in PC architecture.
 This example is highly simplyfied. In practise it is a lot worse than I demonstrate
 here, because while waiting for the database query to finish, your application
 still gets it's share of resources (CPU) so while the load on the machine is over
 1.00 it's actually doing nothing for half the time... :( This is true, take a
 university course in information technology if ya want to know...

It would be overly difficult for me to address every falsehood in that
paragraph, so I will summarize by saying that I've never seen more
psuedo-technical bullshit concentrated in one place before.

I will address two points:

There is a very high degree of parallelism in modern PC architecture. 
The I/O hardware is helpful here.  The machine can do many things while
a SCSI subsystem is processing a command, or the network hardware is
writing a buffer over the wire.

If a process is not runnable (that is, it is blocked waiting for I/O or
similar), it is not using significant CPU time.  The only CPU time that
will be required to maintain a blocked process is the time it takes for
the operating system's scheduler to look at the process, decide that it
is still not runnable, and move on to the next process in the list. 
This is hardly any time at all.  In your example, if you have two
processes and one of them is blocked on I/O and the other is CPU bound,
the blocked process is getting 0% CPU time, the runnable process is
getting 99.9% CPU time, and the kernel scheduler is using the remainder.

-jwb



Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Perrin Harkins

Joshua Chamas wrote:
 There is no way that people are going to benchmark
 10+ different environments themselves, so this merely offers
 a quick fix to get people going with their own comparisons.

I agree that having the code snippets for running hello world on
different tools collected in one place is handy.

 Do you have any idea how much time it takes to do these?

Yes, I've done quite a few of them.  I never said they were easy.

 In order to improve the benchmarks, like the Resin  Velocigen
 ones that you cited where we have a very small sample, we simply
 need more numbers from more people.

I think we would need more numbers from the exact same people, on the
same machines, with the same configuration, the same client, the same
network, the same Linux kernel... In other words, controlled conditions.

 Also, any disclaimer modifications might be good if you feel
 there can be more work done there.

Ideally, I would get rid of every page except the one which lists the
tests grouped by OS/machine.  Then I would put a big statement at the
top saying that comparisons across different people's tests are
meaningless.

- Perrin



Re: Novel technique for dynamic web page generation

2000-01-29 Thread Paul J. Lucas

On 28 Jan 2000, Randal L. Schwartz wrote:

 Have you looked at the new XS version of HTML::Parser?

Not previously, but I just did.

 It's a speedy little beasty.  I dare say probably faster than even
 expat-based XML::Parser because it doesn't do quite as much.

But still an order of magnitude slower than mine.  For a test,
I downloaded Yahoo!'s home page for a test HTML file and wrote
the following code:

- test code -
#! /usr/local/bin/perl

use Benchmark;
use HTML::Parser;
use HTML::Tree;

@t = timethese( 1000, {
   'Parser' = '$p = HTML::Parser-new(); $p-parse_file( "/tmp/test.html" );',
   'Tree'   = '$html = HTML::Tree-new( "/tmp/test.html" );',
} );
-

The results are:

- results -
Benchmark: timing 1000 iterations of Parser, Tree...
Parser: 37 secs (36.22 usr  0.15 sys = 36.37 cpu)
  Tree:  7 secs ( 7.40 usr  0.22 sys =  7.62 cpu)
---

One really can't compete against mmap(2), pointer arithmetic,
and dereferencing.

- Paul



Re: squid performance

2000-01-29 Thread Greg Stark


Leslie Mikesell [EMAIL PROTECTED] writes:

 I agree that it is correct to serve images from a lightweight server
 but I don't quite understand how these points relate.  A proxy should
 avoid the need to hit the backend server for static content if the
 cache copy is current unless the user hits the reload button and
 the browser sends the request with 'pragma: no-cache'.

I'll try to expand a bit on the details:

  1) Netscape/IE won't intermix slow dynamic requests with fast static requests
 on the same keep-alive connection
 
 I thought they just opened several connections in parallel without regard
 for the type of content.

Right, that's the problem. If the two types of content are coming from the
same proxy server (as far as NS/IE is concerned) then they will intermix the
requests and the slow page could hold up several images queued behind it. I
actually suspect IE5 is cleverer about this, but you still know more than it
does.

By putting them on different hostnames the browser will open a second set of
parallel connections to that server and keep the two types of requests
separate.

  2) static images won't be delayed when the proxy gets bogged down waiting on
 the backend dynamic server.

Picture the following situation: The dynamic server normally generates pages
in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can
handle 20 connections per second. The mod_proxy runs 200 processes and it
handles static requests very quickly, so it can handle some huge number of
static requests, but it can still only handle 20 proxied requests per second.

Now something happens to your mod_perl server and it starts taking 2s to
generate pages. The proxy server continues to get up to 20 requests per second
for proxied pages, for each request it tries to connect to the mod_perl
server. The mod_perl server can now only handle 5 requests per second though.
So the proxy server processes quickly end up waiting in the backlog queue. 

Now *all* the mod_proxy processes are in "R" state and handling proxied
requests. The result is that the static images -- which under normal
conditions are handled quicly -- become delayed until a proxy process is
available to handle the request. Eventually the backlog queue will fill up and
the proxy server will hand out errors.

 This is a good idea because it is easy to move to a different machine
 if the load makes it necessary.  However, a simple approach is to
 use a non-mod_perl apache as a non-caching proxy front end for the
 dynamic content and let it deliver the static pages directly.  A
 short stack of RewriteRules can arrange this if you use the 
 [L] or [PT] flags on the matches you want the front end to serve
 and the [P] flag on the matches to proxy.

That's what I thought. I'm trying to help others avoid my mistake :)

Use a separate hostname for your pictures, it's a pain on the html authors but
it's worth it in the long run.
-- 
greg



Re: splitting mod_perl and sql over machines

2000-01-29 Thread Leslie Mikesell

According to Jeffrey W. Baker:

 I will address two points:
 
 There is a very high degree of parallelism in modern PC architecture. 
 The I/O hardware is helpful here.  The machine can do many things while
 a SCSI subsystem is processing a command, or the network hardware is
 writing a buffer over the wire.

Yes, for performance it is going to boil down to contention for
disk and RAM and (rarely) CPU.  You just have to look at pricing
for your particular scale of machine to see whether it is cheaper
to stuff more in the same box or add another.  However, once you
have multiple web server boxes the backend database becomes a
single point of failure so I consider it a good idea to shield
it from direct internet access.

  Les Mikesell
   [EMAIL PROTECTED]



Re: squid performance

2000-01-29 Thread Leslie Mikesell

According to Greg Stark:

   1) Netscape/IE won't intermix slow dynamic requests with fast static requests
  on the same keep-alive connection
  
  I thought they just opened several connections in parallel without regard
  for the type of content.
 
 Right, that's the problem. If the two types of content are coming from the
 same proxy server (as far as NS/IE is concerned) then they will intermix the
 requests and the slow page could hold up several images queued behind it. I
 actually suspect IE5 is cleverer about this, but you still know more than it
 does.

They have a maximum number of connections they will open at once
but I don't think there is any concept of queueing involved. 

   2) static images won't be delayed when the proxy gets bogged down waiting on
  the backend dynamic server.
 
 Picture the following situation: The dynamic server normally generates pages
 in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can
 handle 20 connections per second. The mod_proxy runs 200 processes and it
 handles static requests very quickly, so it can handle some huge number of
 static requests, but it can still only handle 20 proxied requests per second.
 
 Now something happens to your mod_perl server and it starts taking 2s to
 generate pages.

The 'something happens' is the part I don't understand.  On a unix
server, nothing one httpd process does should affect another
one's ability to serve up a static file quickly, mod_perl or
not.  (Well, almost anyway). 

 The proxy server continues to get up to 20 requests per second
 for proxied pages, for each request it tries to connect to the mod_perl
 server. The mod_perl server can now only handle 5 requests per second though.
 So the proxy server processes quickly end up waiting in the backlog queue. 

If you are using squid or a caching proxy, those static requests
would not be passed to the backend most of the time anyway. 

 Now *all* the mod_proxy processes are in "R" state and handling proxied
 requests. The result is that the static images -- which under normal
 conditions are handled quicly -- become delayed until a proxy process is
 available to handle the request. Eventually the backlog queue will fill up and
 the proxy server will hand out errors.

But only if it doesn't cache or know how to serve static content itself.

 Use a separate hostname for your pictures, it's a pain on the html authors but
 it's worth it in the long run.

That depends on what happens in the long run. If your domain name or
vhost changes, all of those non-relative links will have to be
fixed again.

  Les Mikesell
   [EMAIL PROTECTED]



Re: squid performance

2000-01-29 Thread Greg Stark

Leslie Mikesell [EMAIL PROTECTED] writes:

 The 'something happens' is the part I don't understand.  On a unix
 server, nothing one httpd process does should affect another
 one's ability to serve up a static file quickly, mod_perl or
 not.  (Well, almost anyway). 

Welcome to the real world however where "something" can and does happen.
Developers accidentally put untuned SQL code in a new page that takes too long
to run. Database backups slow down normal processing. Disks crash slowing down
the RAID array (if you're lucky). Developers include dependencies on services
like mail directly in the web server instead of handling mail asynchronously
and mail servers slow down for no reason at all. etc.

  The proxy server continues to get up to 20 requests per second
  for proxied pages, for each request it tries to connect to the mod_perl
  server. The mod_perl server can now only handle 5 requests per second though.
  So the proxy server processes quickly end up waiting in the backlog queue. 
 
 If you are using squid or a caching proxy, those static requests
 would not be passed to the backend most of the time anyway. 

Please reread the analysis more carefully. I explained that. That is
precisely the scenario I'm describing faults in.

-- 
greg



Strange problems.

2000-01-29 Thread Billow

This message was sent from Geocrawler.com by "Billow" [EMAIL PROTECTED]
Be sure to reply to that address.

I am a new user in Mod_perl.
I found a trange problem.
I defined some viariables in main script.(use my
...)
And I want to use it directly in the subroutine.
But sometimes, I can use the viariables, sometimes
they are null. (I use reload in my browser)

The script is like:
###
.
my (@a,@b) = ();
@a = ...
@b = ...

();

function 
{
 print "@a";
 print "@b";
}
##

Any hints?
If I use xxx(@a,@b),
in function , I use my($a,$b) = @_;,
it's ok.

Any differences in Mod_perl with Cgi-Perl


Geocrawler.com - The Knowledge Archive



Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Joshua Chamas

Perrin Harkins wrote:
 
 I think we would need more numbers from the exact same people, on the
 same machines, with the same configuration, the same client, the same
 network, the same Linux kernel... In other words, controlled conditions.
 

I hear you, so how about a recommendation that people submit
no fewer than 2 benchmarks for listing eligibility, at least 
static html, and another.  The static html can be used as a 
rough control against other systems.

 Ideally, I would get rid of every page except the one which lists the
 tests grouped by OS/machine.  Then I would put a big statement at the
 top saying that comparisons across different people's tests are
 meaningless.
 

I see where you are going, you feel that the summarized results
are misleading, and to some extent they are in that they are
not "controlled", so people's various hardware, OS, and 
configuration come into play very strongly in how the benchmark
performed, and readers aren't wise enough to digest all the 
info presented and what it all really means.

I think too that the OS/machine results at 
http://www.chamas.com/bench/hello_bycode.html could be more accurate
in comparing results if the results are also grouped by tester, 
network connection type, and testing client so each grouping would 
well reflect the relative speed differences web applications on the 
same platform.

I would argue that we should keep the code type grouping listed at
http://www.chamas.com/bench/hello_bycode.html because it gives
a good feel for how some operating systems  web servers are faster 
than others, i.e., Solaris slower than Linux, WinNT good for static 
HTML, Apache::ASP faster than IIS/ASP PerlScript, etc.  

I should drop the normalized results at 
http://www.chamas.com/bench/hello_normalized.html as they are unfair, 
and could be easily read wrong.  You are not the first to complain 
about this.  The other pages sort by Rate/MHz anyway, so someone
can get a rough idea on those pages for what's faster overall.

Finally, I would very much like to keep the fastest benchmark page
as the first page, disclaiming it to death if necessary, the reason 
being that I would like to encourage future submissions, with 
new  faster hardware  OS configurations, and the best way to do 
that is to have something of a benchmark competition happening on the 
first page of the results.

It seems that HTTP 1.1 submissions represent a small subset of
skewed results, should these be dropped or presented separately?
I already exclude them from the "top 10" style list since they
don't compare well to HTTP 1.0 results, which are the majority.

I also need to clarify some results, or back them up somehow.
What should I do with results that seem skewed in general?
Not post them until there is secondary confirmation ?

Thanks Perrin for your feedback.

-- Joshua
_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks  free web link monitoring   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051



Re: ANNOUNCE: Updated Hello World Web Application Benchmarks

2000-01-29 Thread Perrin Harkins

 I think too that the OS/machine results at
 http://www.chamas.com/bench/hello_bycode.html could be more accurate
 in comparing results if the results are also grouped by tester,
 network connection type, and testing client so each grouping would
 well reflect the relative speed differences web applications on the
 same platform.

Agreed.

 I would argue that we should keep the code type grouping listed at
 http://www.chamas.com/bench/hello_bycode.html because it gives
 a good feel for how some operating systems  web servers are faster
 than others, i.e., Solaris slower than Linux, WinNT good for static
 HTML, Apache::ASP faster than IIS/ASP PerlScript, etc.

See, I don't think you can even make statements like that based on these
benchmarks.  Where is the test on Solaris x86 and Linux done by the same
person under the same conditions?  I don't see one.  Where is the test of NT
and Linux on the same machine by the same person?  Even the Apache::ASP vs
PerlScript comparisons you did seem to be using different clients, netowork
setups, and versions of NT.

I'm not criticizing you for not being able to get lab-quality results, but I
think we have to be careful what conclusions we draw from these.

 Finally, I would very much like to keep the fastest benchmark page
 as the first page, disclaiming it to death if necessary, the reason
 being that I would like to encourage future submissions, with
 new  faster hardware  OS configurations, and the best way to do
 that is to have something of a benchmark competition happening on the
 first page of the results.

I can understand that; I just don't want mod_perl users to get a reputation
as the Mindcraft of web application benchmarks.

 It seems that HTTP 1.1 submissions represent a small subset of
 skewed results, should these be dropped or presented separately?

I'd say they're as meaningful as any of the others if you consider them
independently of the other contributions.

 I also need to clarify some results, or back them up somehow.
 What should I do with results that seem skewed in general?
 Not post them until there is secondary confirmation ?

Your call.  Again, to my mind each person's contribution can only be viewed
in its own private context, so one is no more skewed than any other.

- Perrin