Re: ApacheDBI question
At 05:40 PM 1/28/00 -0500, Deepak Gupta wrote: How does connection pooling determine how many connections to keep open? The reason I ask is that I am afraid my non-modperl scripts are getting rejected by the db server b/c all (or most) connections are being dedicated to Apache activity. Apache::DBI keeps one connection open per process per unique connection string. If you have 175 modperl processes running, be prepared to cope with as many as 175 database connections. The source code for Apache::DBI is worth a look -- it's very short and easy to understand, and then you'll know all there is to know about how it works. --- Mark Cogan[EMAIL PROTECTED] +1-520-881-8101 ArtToday www.arttoday.com
Re: splitting mod_perl and sql over machines
Marko van der Puil wrote: so httpd 1 has just queried the database and httpd 2 is just executing... It also has to query the database, so it has to wait, for httpd 1 to finish. (not actually how it works but close enough) Now httpd 1 has the results from the query and is preparing to read the template from disk. httpd 2 is now quering the database... Now httpd 1 has to wait for the httpd 2 query to finish, before it can fetch it's template from disk. a.s.o. a.s.o. This, unfortunately is (still) how pc's work. There's no such thing as paralel processing in PC architecture. This example is highly simplyfied. In practise it is a lot worse than I demonstrate here, because while waiting for the database query to finish, your application still gets it's share of resources (CPU) so while the load on the machine is over 1.00 it's actually doing nothing for half the time... :( This is true, take a university course in information technology if ya want to know... It would be overly difficult for me to address every falsehood in that paragraph, so I will summarize by saying that I've never seen more psuedo-technical bullshit concentrated in one place before. I will address two points: There is a very high degree of parallelism in modern PC architecture. The I/O hardware is helpful here. The machine can do many things while a SCSI subsystem is processing a command, or the network hardware is writing a buffer over the wire. If a process is not runnable (that is, it is blocked waiting for I/O or similar), it is not using significant CPU time. The only CPU time that will be required to maintain a blocked process is the time it takes for the operating system's scheduler to look at the process, decide that it is still not runnable, and move on to the next process in the list. This is hardly any time at all. In your example, if you have two processes and one of them is blocked on I/O and the other is CPU bound, the blocked process is getting 0% CPU time, the runnable process is getting 99.9% CPU time, and the kernel scheduler is using the remainder. -jwb
Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
Joshua Chamas wrote: There is no way that people are going to benchmark 10+ different environments themselves, so this merely offers a quick fix to get people going with their own comparisons. I agree that having the code snippets for running hello world on different tools collected in one place is handy. Do you have any idea how much time it takes to do these? Yes, I've done quite a few of them. I never said they were easy. In order to improve the benchmarks, like the Resin Velocigen ones that you cited where we have a very small sample, we simply need more numbers from more people. I think we would need more numbers from the exact same people, on the same machines, with the same configuration, the same client, the same network, the same Linux kernel... In other words, controlled conditions. Also, any disclaimer modifications might be good if you feel there can be more work done there. Ideally, I would get rid of every page except the one which lists the tests grouped by OS/machine. Then I would put a big statement at the top saying that comparisons across different people's tests are meaningless. - Perrin
Re: Novel technique for dynamic web page generation
On 28 Jan 2000, Randal L. Schwartz wrote: Have you looked at the new XS version of HTML::Parser? Not previously, but I just did. It's a speedy little beasty. I dare say probably faster than even expat-based XML::Parser because it doesn't do quite as much. But still an order of magnitude slower than mine. For a test, I downloaded Yahoo!'s home page for a test HTML file and wrote the following code: - test code - #! /usr/local/bin/perl use Benchmark; use HTML::Parser; use HTML::Tree; @t = timethese( 1000, { 'Parser' = '$p = HTML::Parser-new(); $p-parse_file( "/tmp/test.html" );', 'Tree' = '$html = HTML::Tree-new( "/tmp/test.html" );', } ); - The results are: - results - Benchmark: timing 1000 iterations of Parser, Tree... Parser: 37 secs (36.22 usr 0.15 sys = 36.37 cpu) Tree: 7 secs ( 7.40 usr 0.22 sys = 7.62 cpu) --- One really can't compete against mmap(2), pointer arithmetic, and dereferencing. - Paul
Re: squid performance
Leslie Mikesell [EMAIL PROTECTED] writes: I agree that it is correct to serve images from a lightweight server but I don't quite understand how these points relate. A proxy should avoid the need to hit the backend server for static content if the cache copy is current unless the user hits the reload button and the browser sends the request with 'pragma: no-cache'. I'll try to expand a bit on the details: 1) Netscape/IE won't intermix slow dynamic requests with fast static requests on the same keep-alive connection I thought they just opened several connections in parallel without regard for the type of content. Right, that's the problem. If the two types of content are coming from the same proxy server (as far as NS/IE is concerned) then they will intermix the requests and the slow page could hold up several images queued behind it. I actually suspect IE5 is cleverer about this, but you still know more than it does. By putting them on different hostnames the browser will open a second set of parallel connections to that server and keep the two types of requests separate. 2) static images won't be delayed when the proxy gets bogged down waiting on the backend dynamic server. Picture the following situation: The dynamic server normally generates pages in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can handle 20 connections per second. The mod_proxy runs 200 processes and it handles static requests very quickly, so it can handle some huge number of static requests, but it can still only handle 20 proxied requests per second. Now something happens to your mod_perl server and it starts taking 2s to generate pages. The proxy server continues to get up to 20 requests per second for proxied pages, for each request it tries to connect to the mod_perl server. The mod_perl server can now only handle 5 requests per second though. So the proxy server processes quickly end up waiting in the backlog queue. Now *all* the mod_proxy processes are in "R" state and handling proxied requests. The result is that the static images -- which under normal conditions are handled quicly -- become delayed until a proxy process is available to handle the request. Eventually the backlog queue will fill up and the proxy server will hand out errors. This is a good idea because it is easy to move to a different machine if the load makes it necessary. However, a simple approach is to use a non-mod_perl apache as a non-caching proxy front end for the dynamic content and let it deliver the static pages directly. A short stack of RewriteRules can arrange this if you use the [L] or [PT] flags on the matches you want the front end to serve and the [P] flag on the matches to proxy. That's what I thought. I'm trying to help others avoid my mistake :) Use a separate hostname for your pictures, it's a pain on the html authors but it's worth it in the long run. -- greg
Re: splitting mod_perl and sql over machines
According to Jeffrey W. Baker: I will address two points: There is a very high degree of parallelism in modern PC architecture. The I/O hardware is helpful here. The machine can do many things while a SCSI subsystem is processing a command, or the network hardware is writing a buffer over the wire. Yes, for performance it is going to boil down to contention for disk and RAM and (rarely) CPU. You just have to look at pricing for your particular scale of machine to see whether it is cheaper to stuff more in the same box or add another. However, once you have multiple web server boxes the backend database becomes a single point of failure so I consider it a good idea to shield it from direct internet access. Les Mikesell [EMAIL PROTECTED]
Re: squid performance
According to Greg Stark: 1) Netscape/IE won't intermix slow dynamic requests with fast static requests on the same keep-alive connection I thought they just opened several connections in parallel without regard for the type of content. Right, that's the problem. If the two types of content are coming from the same proxy server (as far as NS/IE is concerned) then they will intermix the requests and the slow page could hold up several images queued behind it. I actually suspect IE5 is cleverer about this, but you still know more than it does. They have a maximum number of connections they will open at once but I don't think there is any concept of queueing involved. 2) static images won't be delayed when the proxy gets bogged down waiting on the backend dynamic server. Picture the following situation: The dynamic server normally generates pages in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can handle 20 connections per second. The mod_proxy runs 200 processes and it handles static requests very quickly, so it can handle some huge number of static requests, but it can still only handle 20 proxied requests per second. Now something happens to your mod_perl server and it starts taking 2s to generate pages. The 'something happens' is the part I don't understand. On a unix server, nothing one httpd process does should affect another one's ability to serve up a static file quickly, mod_perl or not. (Well, almost anyway). The proxy server continues to get up to 20 requests per second for proxied pages, for each request it tries to connect to the mod_perl server. The mod_perl server can now only handle 5 requests per second though. So the proxy server processes quickly end up waiting in the backlog queue. If you are using squid or a caching proxy, those static requests would not be passed to the backend most of the time anyway. Now *all* the mod_proxy processes are in "R" state and handling proxied requests. The result is that the static images -- which under normal conditions are handled quicly -- become delayed until a proxy process is available to handle the request. Eventually the backlog queue will fill up and the proxy server will hand out errors. But only if it doesn't cache or know how to serve static content itself. Use a separate hostname for your pictures, it's a pain on the html authors but it's worth it in the long run. That depends on what happens in the long run. If your domain name or vhost changes, all of those non-relative links will have to be fixed again. Les Mikesell [EMAIL PROTECTED]
Re: squid performance
Leslie Mikesell [EMAIL PROTECTED] writes: The 'something happens' is the part I don't understand. On a unix server, nothing one httpd process does should affect another one's ability to serve up a static file quickly, mod_perl or not. (Well, almost anyway). Welcome to the real world however where "something" can and does happen. Developers accidentally put untuned SQL code in a new page that takes too long to run. Database backups slow down normal processing. Disks crash slowing down the RAID array (if you're lucky). Developers include dependencies on services like mail directly in the web server instead of handling mail asynchronously and mail servers slow down for no reason at all. etc. The proxy server continues to get up to 20 requests per second for proxied pages, for each request it tries to connect to the mod_perl server. The mod_perl server can now only handle 5 requests per second though. So the proxy server processes quickly end up waiting in the backlog queue. If you are using squid or a caching proxy, those static requests would not be passed to the backend most of the time anyway. Please reread the analysis more carefully. I explained that. That is precisely the scenario I'm describing faults in. -- greg
Strange problems.
This message was sent from Geocrawler.com by "Billow" [EMAIL PROTECTED] Be sure to reply to that address. I am a new user in Mod_perl. I found a trange problem. I defined some viariables in main script.(use my ...) And I want to use it directly in the subroutine. But sometimes, I can use the viariables, sometimes they are null. (I use reload in my browser) The script is like: ### . my (@a,@b) = (); @a = ... @b = ... (); function { print "@a"; print "@b"; } ## Any hints? If I use xxx(@a,@b), in function , I use my($a,$b) = @_;, it's ok. Any differences in Mod_perl with Cgi-Perl Geocrawler.com - The Knowledge Archive
Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
Perrin Harkins wrote: I think we would need more numbers from the exact same people, on the same machines, with the same configuration, the same client, the same network, the same Linux kernel... In other words, controlled conditions. I hear you, so how about a recommendation that people submit no fewer than 2 benchmarks for listing eligibility, at least static html, and another. The static html can be used as a rough control against other systems. Ideally, I would get rid of every page except the one which lists the tests grouped by OS/machine. Then I would put a big statement at the top saying that comparisons across different people's tests are meaningless. I see where you are going, you feel that the summarized results are misleading, and to some extent they are in that they are not "controlled", so people's various hardware, OS, and configuration come into play very strongly in how the benchmark performed, and readers aren't wise enough to digest all the info presented and what it all really means. I think too that the OS/machine results at http://www.chamas.com/bench/hello_bycode.html could be more accurate in comparing results if the results are also grouped by tester, network connection type, and testing client so each grouping would well reflect the relative speed differences web applications on the same platform. I would argue that we should keep the code type grouping listed at http://www.chamas.com/bench/hello_bycode.html because it gives a good feel for how some operating systems web servers are faster than others, i.e., Solaris slower than Linux, WinNT good for static HTML, Apache::ASP faster than IIS/ASP PerlScript, etc. I should drop the normalized results at http://www.chamas.com/bench/hello_normalized.html as they are unfair, and could be easily read wrong. You are not the first to complain about this. The other pages sort by Rate/MHz anyway, so someone can get a rough idea on those pages for what's faster overall. Finally, I would very much like to keep the fastest benchmark page as the first page, disclaiming it to death if necessary, the reason being that I would like to encourage future submissions, with new faster hardware OS configurations, and the best way to do that is to have something of a benchmark competition happening on the first page of the results. It seems that HTTP 1.1 submissions represent a small subset of skewed results, should these be dropped or presented separately? I already exclude them from the "top 10" style list since they don't compare well to HTTP 1.0 results, which are the majority. I also need to clarify some results, or back them up somehow. What should I do with results that seem skewed in general? Not post them until there is secondary confirmation ? Thanks Perrin for your feedback. -- Joshua _ Joshua Chamas Chamas Enterprises Inc. NodeWorks free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051
Re: ANNOUNCE: Updated Hello World Web Application Benchmarks
I think too that the OS/machine results at http://www.chamas.com/bench/hello_bycode.html could be more accurate in comparing results if the results are also grouped by tester, network connection type, and testing client so each grouping would well reflect the relative speed differences web applications on the same platform. Agreed. I would argue that we should keep the code type grouping listed at http://www.chamas.com/bench/hello_bycode.html because it gives a good feel for how some operating systems web servers are faster than others, i.e., Solaris slower than Linux, WinNT good for static HTML, Apache::ASP faster than IIS/ASP PerlScript, etc. See, I don't think you can even make statements like that based on these benchmarks. Where is the test on Solaris x86 and Linux done by the same person under the same conditions? I don't see one. Where is the test of NT and Linux on the same machine by the same person? Even the Apache::ASP vs PerlScript comparisons you did seem to be using different clients, netowork setups, and versions of NT. I'm not criticizing you for not being able to get lab-quality results, but I think we have to be careful what conclusions we draw from these. Finally, I would very much like to keep the fastest benchmark page as the first page, disclaiming it to death if necessary, the reason being that I would like to encourage future submissions, with new faster hardware OS configurations, and the best way to do that is to have something of a benchmark competition happening on the first page of the results. I can understand that; I just don't want mod_perl users to get a reputation as the Mindcraft of web application benchmarks. It seems that HTTP 1.1 submissions represent a small subset of skewed results, should these be dropped or presented separately? I'd say they're as meaningful as any of the others if you consider them independently of the other contributions. I also need to clarify some results, or back them up somehow. What should I do with results that seem skewed in general? Not post them until there is secondary confirmation ? Your call. Again, to my mind each person's contribution can only be viewed in its own private context, so one is no more skewed than any other. - Perrin