Re: IP based instant throttle?
[EMAIL PROTECTED] (Randal L. Schwartz) wrote: I'm told that my CPU throttler was used at etoys.com for a similar purpose, and permitted them to keep from losing millions of dollars of revenue due to people spidering their catalog. That's correct, although it was actually a bunch of DoS attacks that we were using it against. We modified it to just count hits, and skip the CPU stuff. It worked well across a cluster, using NFS to share the files with the hit data in them. Since it's an access handler, it was easy to just turn it on for specific URLs where repeated access hurts. This avoids any issues with parallel fetches for images. We also used cookies (verified similarly to the ticket scheme in the Eagle Book) as the primary identifier and only fell back to IP if there was no valid cookie. This can help with the proxy (i.e. many users with one IP) problem, but you still have to make exceptions for things like AOL proxies that can blast you with legitimate traffic. If thousands of AOL users all click on an ad banner in the same 10 seconds, you don't want to ban them. - Perrin
Re: Apache::Session / No-Cookie-Tracking
Sure - I believe in magic, depending on your definition of it. I KNOW there's a 4th method, because I've seen it work. There is an e-commerce web site which uses an outside cart programmed in CGI (Perl?). The original web site passes no identifying marks such as the session ID through the URL or through the form's submit button to add an item to the cart. I know, because I designed and created the web site. However, when the visitors hit the submit button, they are taken to another program/website containing their shopping basket filled with their items. I have figured out that it relies somewhat on the IP address, but not completely, because I have tested it behind the firewall and the other computer behind the firewall with me does not share the same basket. Once I am at that screen (viewing the contents of my cart on the program), there are other links which contain a session ID of sorts carried via the URL. The thing that is driving my head crazy is how they identify the user in the first place to create the links with the session ID. I accidentally caught them during testing or something and got a variable on the URL line. (I substituted the domain name - it's not really cart.com) http://www.cart.com/cgi-bin/cart.cgi?cartidnum=208.144.33.190T990806951R5848 E cartidnum seems to be: $IP-Address + T + Unix-TimeStamp + R + Unknown number + E By the way, the session only seems to active until the browser completely shuts down. Any ideas? Sure sounds like a cookie to me. What makes you think it isn't one? Or else they just don't care who you are until you hit the shopping cart, and then they keep your identity with URLs and hidden form fields. - Perrin
Re: mod_perl and 700k files...
on 5/12/01 5:46 PM, Morbus Iff at [EMAIL PROTECTED] wrote: I store a .stor file which is a storable dump of my XML tree. I check the mtime of that against the mtime of the .xml file. Whichever is newer I load that. Works fast and is very simple. I'll certainly check it out. The only trouble with that is that you will have a separate copy in every child taking up 700K or more. You can only avoid that if you restart the server or use some kind of shared memory approach. - Perrin
Re: looking for a functin to move %fdat to the symbol table..
It's not hard to do, but it is potentially dangerous since you could overwrite globals like $/ and change the behavior of your program. In general, it's best to avoid cluttering the symbol table. - Perrin
Re: mod_perl and 700k files...
on 5/9/01 5:45 PM, Morbus Iff at [EMAIL PROTECTED] wrote: Keep in mind, if you load this data during startup (in the parent) it will be shared, but reloading it later will make a separate copy in each child, chewing up a large amount of memory. You might have better luck using dbm That is something I was hoping I wouldn't here ;) ... Even reloading the file into the same variable in my startup.pl wouldn't cause the parent to share it with new children? No, that won't work. You could try one of the IPC:: modules like IPC::ShareLite or IPC::MM, but I think you'll still end with a scalar that takes up more than 700K in each child. If you can't live with that, you might try breaking up the file more so that you can access it in smaller chunks. - Perrin
Re: mod_perl and 700k files...
on 5/9/01 5:14 PM, Morbus Iff at [EMAIL PROTECTED] wrote: That, unfortunately doesn't tell me what causes a USR2 signal to be sent to Apache. You can use the kill command to send a USR2 signal. Or when it's caused. When you send it. I only want to reload the file when said file has changed. Am I supposed to do some checking against the file -M time myself, and then send a USR2 signal myself? You might have better luck just having your app check -M against the file and reload as needed. If you don't want to take the hit on every request, you can just use a counter or a last checked time kept in a global, and check every 10 minutes or 100 requests or whatever. Keep in mind, if you load this data during startup (in the parent) it will be shared, but reloading it later will make a separate copy in each child, chewing up a large amount of memory. You might have better luck using dbm files or something similar that doesn't need to keep the whole thing in memory at once. - Perrin
Re: Throwing die in Apache::Registry
on 5/4/01 9:28 AM, Mark Maunder at [EMAIL PROTECTED] wrote: I have an Apache::Registry script that is using XML::Parser. The parser throws a 'die' call if it encounters a parse error (Why?). Because it's an exception and the parser can't continue. I was handling this by putting the code in an eval block, but this no longer works since all Registry scripts are already in one huge eval block. It should still work. An eval{} is scoped like any other block. Maybe you have a typo? Post your code and we'll look at it. - Perrin
Re: Exception modules
on 4/30/01 8:47 PM, brian moseley at [EMAIL PROTECTED] wrote: On Mon, 30 Apr 2001, Jeffrey W. Baker wrote: type of exception. Right now I cannot in fact think of any program I have written that branches on the type of exception. Java encourages this with multiple catch in CP Web Mail, the underlying libraries throw typed exceptions so that the application layer can display the correct error notification to the user. for instance, if the library throws CP::InvalidMailboxNameException, Web Mail can display 'the mailbox name you suggested contains an illegal character. it must correspond to the format thus-and-such. try again.', whereas if the library throws CP::Exception (the generic exception), Web Mail will handle it as a service problem and display that godawful WM page. I've tried that, but last time I went with more general classes of exceptions containing unique error IDs (defined in a constants module) to indicate the exact type. Not as Java-like, but it did save me from creating dozens of classes with no unique properties except their names. I've also tried making separate hierarchies of exceptions for user errors (illegal input) vs. system errors (can't connect to database). In those cases, you usually do switch based on exception class, because the user errors need to be handled differently. I suppose it's a matter of debate whether or not bad user input should be handled with exceptions at all, but since I like to keep the controller code simple and let the data model objects do the input checking, you have to bubble it back up somehow. I'm still kind of unsatisfied with how difficult some of the user exception handling turned out to be, and I'll probably try something different next time - Perrin
Re: modperl/ASP and MVC design pattern
On Fri, 20 Apr 2001, Francesco Pasqualini wrote: But are there in the mod_perl architecture some guidelines and/or frameworks that encourages the MVC design patern ? I think that Apache::ASP could be (for example) the right tool, adding the "forward" feature. The forward feature looks like an ordinary include to me. Is there any real difference between that and the Apache::ASP version? $Response-Include("filename.inc", @args); In addition to Apache::PageKit, you might want to check out the documentation for Template Toolkit, especially http://www.template-toolkit.org/docs/default/index.html. - Perrin
Re: Fast DB access
"Chutzpah" is an interesting way of putting it. I've been thinking of them as "slimeballs in the busy of conning webkids into thinking they have a real RDBM product". (It isn't a moot point, because it's the same people working on it: human character issues are actually relevant when making technical decisions.) Why does discussion of databases - possibly the most boring subject on the planet - always degenerate to name-calling? MySQL is an excellent solution for a wide range of problems, as are dbm files and flat files. The developers give the code away for free, and do not hide the fact that it doesn't support transactions. There's no need for this kind of vitriol. - Perrin
Re: from the quick hacks department... x-bit controls mod_cgi
Can you briefly explain why it leaks memory? I haven't tried it, but I'm guessing it's creating a new anonymous sub on every request. I have been playing with Apache::Leak and Devel::Leak trying to figure out what is happening when Perl code leaks memory, but I haven't got my head around it yet... Most people don't get much useful information out of those modules. The things people think of as leaks are often not really leaks, so they don't show up with these (see below). Also, a more general question to the list. How reasonable is it to assume that most of the more standard modules on CPAN don't leak memory when used in a mod_perl environment? Totally unreasonable. Most module authors have not attempted to look for process growth over long periods of use. They may have tried to get rid of any circular references, but that's usually about it. Let's be clear about terminology: a real memory leak is a situation where a program discards some memory and fails to free it or reuse it. Perl has some of these, a few of which are documented here: http://language.perl.com/faq/v2/Q4.19.html Usually though, growth in size is not from a leak; it's just perl using more memory. There are some things to be careful of that are listed in the guide (passing large strings by value, slurping whole files into a single scalar, etc.). Note that lexical variables do not relinquish memory when they go out of scope, unless you manually undef them. Some growth will happen when the child processes use variables that were in copy-on-write memory from the parent process. How can you tell what's going on? If you hit your module 100 times, and then you hit it another 100 and it continues to grow, you may have an actual leak. If it stabilizes after the first 100, you just have normal growth. Don't expect to see growth on every hit; perl allocates memory in chunks and only grabs another chunk when it needs one. You can read some interesting stuff from Matt about finding memory leaks here: http://groups.yahoo.com/group/modperl/message/27908 http://groups.yahoo.com/group/modperl/message/27943 - Perrin
Re: Fast DB access
b) Flat file : Create a Linux directory structure with the same hierarchy as the attributesi.e., directory structure has publishers/sizes/types/ip numbers. ip numbers is the file name which contains a list of ads. Objective is to pick the right file, open this file and create a hash with the contents of the file. You might get better performance by using a combined key, hashing it, and splitting into directories after the first 2 characters in the key. This would mean 2 directories to traverse for each lookup, rather than 4. I believe the File::Cache module works this way, so you could steal code from there. However, dbm is a good choice for this. You may find SDBM_File faster than DB_File if your records are small enough for it (I think the limit is 2K per record). - Perrin
Re: mac_check in eagle book
On 16 Apr 2001, Chip Turner wrote: The modperl book mentions it double hashes to prevent a malicious user from concatenating data onto the values being checked. I don't know if they are referring to this weakness, but I suspect they are. Sadly, the book doesn't seem to offer a reference for the claim as to the specific md5 vulnderability. (Hey Doug, wanna shed some light on that somewhat cryptic passage? :) I'm sure I recall seeing a book on cryptography mentioned in the footnotes somewhere... The crypto section in Mastering Algoritms with Perl is a pretty good overview. - Perrin
Re: Dynamic httpd.conf file using mod_perl...
What I'm trying to do is have apache build the httpd.conf file dynamically when it starts from a MySQL database. It might be easier and more bulletproof to build the conf file off-line with a simple perl script and a templating tool. We did this with Template Toolkit and it worked well. - Perrin
Re: Dynamic httpd.conf file using mod_perl...
It might be easier and more bulletproof to build the conf file off-line with a simple perl script and a templating tool. We did this with Template Toolkit and it worked well. - Perrin That would be fine and dandy, but it's not exactly what I'm going after. Currently if I want to make a change to all of our clients I have to go through and edit every config file (I have a .conf file for each domain and then use an Include in the httpd.conf). Using the mod_perl way I can change it once in the httpd.conf file, restart apache, and the change will take place for all the domains that are affected by the Perl /Perl code. Know what I mean? Sure, and it looks like you got your question answered. The two approaches are pretty similar in terms of the results, but the off-line approach does require either using a custom startup script or doing two steps (build conf and then restart server). On the other hand, the off-line approach will allow you to start your server even when the database is down. You might want to build your dynamic conf file approach with a cache for the last-accessed database info, so that it has something to fall back to if the db goes down. - Perrin
Re: negative LocationMatch syntax?
Matt Sergeant wrote: Is there a way I could use LocationMatch to specify a not condition? as in LocationMatch !~ "/(thisfile|thatDir|whatever).*" SSLVerifyClient require /LocationMatch That would let me list the exceptions, and everything else would be restricted by default.. It's really frustrating, but this is *not* possible... Maybe with a Perl section?
Re: Cutting down on the DEBUG bloat...
As part of my ongoing effort to streamline my mod_perl apps, I've come to discover the joy of constant subroutines and perl's ability to inline or eliminate code at compile time. I have a solution that works, but would be interested in seeing if others had better syntactic sugar.. You could use Filter::CPP with #ifdefs for this. #ifdef DEBUGGING print STDERR $some_thing; #endif - Perrin
Re: Storable (lock_store lock_retrieve) question
I am currently using Storables lock_store and lock_retrieve to maintain a persistent data structure. I use a session_id as a key and each data structure has a last modified time that I use to expire it. I was under the impression that these two functions would be safe for concurrent access, but I seem to be getting 'bad hashes' returned after there is an attempt at concurrent access to the storable file. (You're not using NFS, right?) What are the specifics of your bad hashes? Are they actually corrupted, or do they just contain data that's different from what you expected? The lock_retrieve function only locks the file while it is being read, so there's nothing to stop a different process from running in and updating the file while you are manipulating the data in memory. Then you store it and overwrite whatever updates other processes may have done. If you want to avoid this, you'll need to do your own locking, or just use Apache::Session. - Perrin
Re: mirroring data across a server cluster
I'm trying to address 2 issues: A. Avoiding a single point of failure associated with a having a central repository for the data, such as a NFS share or a single database server. B. Avoiding the overhead from using heavyweight tools like database replication. So I've been thinking about how to pull that off, and I think I've figured out how, as long as I don't need every machine to have exactly the same version of the data structure at all times. There are many approaches to this problem, and the one that's appropriate depends on what you're using the data for, how large your cluster is, and how out-of-sync the nodes can be. What it comes down to is implementing 2 classes: one implements a daemon running on each server in the cluster, responsible for handling requests to update the data across the network and one a class usable inside mod_perl to handle local updates and inform other servers of updates. That will get cumbersome if you have a large number of nodes all trying to tell each other about updates. On a recent project, we wanted to share some cached data which didn't have to be well synchronized. We did it by writing a daemon like you're suggesting that sits on top of a fast BerkeleyDB database, but we used multicast for sending out updates. With a large cluster, you'll quickly tie up all your resources if every update has to be sent separately to every other server. If all you're after is redundancy for your short-term user data, you could do something like the "TCP-ring" sessions described here: http://www.caucho.com/products/resin/java_tut/tcp-sessions.xtp. There's nothing especially tricky about this: just write all of your updates through to a backup database on another server, and embed enough information in a cookie to find the backup if the main one for a session fails. I believe I wouldn't be the only person finding something like this terrifically useful. Furthermore, I see that Cache::Cache could be the underlying basis for those classes. Most of the deep network programming is already there in Net::Daemon. You might want to look at http://www.spread.org/ or Recall: http://www.fault-tolerant.org/recall/. Incidentally, database replication may not sound like such a bad idea after you examine some of the alternatives. It's really just one group's solution to the problem you're posing. There are replication tools for MySQL which are supposedly fairly easy to run. - Perrin
Re: ASP / Apache
On Thu, 29 Mar 2001, Victor Michael Blancas wrote: I'm planning to implement a DBI session management integrated with Apache::ASP, much like how Apache::Session works. Might as well just use Apache::Session, if it already does what you need. Is this better for clustered web servers with a single database server or do I just nfs mount a shared directory and put the global directory there? In general, a good database will give you better performance and scalability than NFS. NFS gets clunky when you need locking and synchronization. However, a relatively small site should do fine on NFS. Since NFS-shared sessions already work with Apache::ASP, you could try it, benchmark it for your expected traffic, and then decide. - Perrin
Re: Getting a Cache::SharedMemoryCache started
On Tue, 27 Mar 2001, DeWitt Clinton wrote: Which reminds me of something. These cache objects are not currently thread safe. When should I start expecting multi-threaded apache/mod_perl to become mainstream enough to warrant an overhaul of the code? I imagine that nearly most Perl libraries are not thread safe, of course. But code that will be used in mod_perl environments needs to be, right? You can read all about it here: http://www.apache.org/~dougm/modperl_2.0.html The gist is that you probably don't need to change any Perl code, but some XS modules on CPAN may have to change. - Perrin
Re: [ANNOUNCE} mod_perl moduile you may be interested in
have done a search on CPAN for "resume" and "cv" did not come up with anything like what i am doing http://www.zeuscat.com/andrew/work/resume/formats.shtml - Perrin
Re: dbm locking info in the guide
Ok, what about calling sync before accesing the database? (read and write) Will it force the process to sync its data with the disk, or will it cause the corruption of the file on the disk, as the process might have a stale data? Well, that's what we don't know. As David Harris pointed out, if it does do the right thing and re-read from disk, it's probably not much better than re-opening the database. I suppose it would avoid some Perl object creation though, so it would be at least a little faster. - Perrin
Re: dbm locking info in the guide
On Tue, 20 Mar 2001, Stas Bekman wrote: Is anyone aware of a safe to way to do multi-process read/write access through a dbm module other than BerkeleyDB.pm without tie-ing and untie-ing every time? I thought that was the only safe thing to do because of buffering issues, but this seems to be implying that careful use of sync calls or something similar would do the trick. Maybe this is just left over from before the problem with the old technique described in the DB_File docs was discovered? Any comments? Well, I wrote this based on my experience. I've used the code that does locking coupled with sync() and it worked fine. You mean with DB_File? There's a big warning in the current version saying not to do that, because there is some initial buffering that happens when opening a database. - Perrin
Re: Problem with Tie::DBI and DBI in modperl
On Tue, 20 Mar 2001, John Mulkerin wrote: There is no error message returned, it just goes back to the httpd 403 error screen. What about in the error log? Have you read the DBI docs on how to get your error message to print? You should either have RaiseError on or be checking return codes from every DBI call. If you haven't turned PrintError off, DBI will issue a warning that goes to your log when it fails to connect. - Perrin
Re: understanding memory via ps -ely | grep http
On Tue, 20 Mar 2001, Tim Gardner wrote: I understand that the RSS is the resident size in KB and the SZ column is the size of the process, but what should I be seeing in the way of reduced memory? The 13MB/18MB is not much different from when I don't preload anything. Should I be seeing something else? You have to look at SHARE. Subtract SHARE from RSS. - Perrin
Re: dbm locking info in the guide
On Tue, 20 Mar 2001, Joshua Chamas wrote: I know the tie/untie MLDBM::Sync strategy with DB_File is slow, but what size data are you caching? I'm not. Well, actually I am, but I use BerkeleyDB which handles its own locking. I just noticed this in the Guide and figured that either it was out of date or I missed something interesting. It may be that you can use MLDBM::Sync with SDBM_File, with records 100 bytes would be good, or MLDBM::Sync with MLDBM::Sync::SDBM_File which faster through around 5000-1 byte records with Compress::Zlib installed. Generally, the tie/untie with a SDBM_File is pretty fast. I'll update the Guide to mention your module in the dbm section. - Perrin
Re: dbm locking info in the guide
On Wed, 21 Mar 2001, Stas Bekman wrote: You mean with DB_File? There's a big warning in the current version saying not to do that, because there is some initial buffering that happens when opening a database. The warning says not to lock on dbm fd but an external file! I think you'll still have problems with this technique, unless you tie/untie every time. I'm looking at the perldoc for DB_File version 1.76, at the section titled "Locking: the trouble with fd". At the very least, you'd have to call sync after acquiring a write lock but before writing anything. - Perrin
Re: dbm locking info in the guide
Stas Bekman wrote: So basically what you are saying is that sync() is broken and shouldn't be used at all. Something fishy is going on. The purpose of sync() is to flush the modifications to the disk. Saving changes to disk isn't the problem. The issue is that some of the database gets cached in memory when you open the database (even if you don't actually read anything from it), so changes made in other processes will not be seen. To get around this, you would have to somehow reload the cached data from disk just after getting a write lock but before making any changes. Unless you are talking about a process that wants to read after some other process had changed the database, and there is a hazard that the former process has the data cached and will not know that dbm has been modified. Exactly. Keeping the database open is fine as long as you have a read-only app. For read/write, you have to tie/untie every time. Or use BerkeleyDB. - Perrin
Re: [ANNOUNCE] MLDBM::Sync v.07
On Mon, 19 Mar 2001, Joshua Chamas wrote: A recent API addition allows for a secondary cache layer with Tie::Cache to be automatically used When one process writes a change to the dbm, will the others all see it, even if they use this? - Perrin
Re: [OT] ApacheCon BOF
On Mon, 19 Mar 2001, Charles J. Brabec wrote: The Perl advocate's version: mod_perl: Let's see you try to do this with Python. I know you're only joking, but let's not fall into that trap of confusing arrogance with advocacy. This is my chief complaint about Pythoners: they're always insulting Perl! I consider it poor form. Incidentally, there are some nice mod_perl clones for Python, based partially on Doug's work. - Perrin
dbm locking info in the guide
While working on adding info on Berkeley DB to the Guide, I came across this statement: "If you need to access a dbm file in your mod_perl code in the read only mode the operation would be much faster if you keep the dbm file open (tied) all the time and therefore ready to be used. This will work with dynamic (read/write) databases accesses as well, but you need to use locking and data flushing to avoid data corruption." Is anyone aware of a safe to way to do multi-process read/write access through a dbm module other than BerkeleyDB.pm without tie-ing and untie-ing every time? I thought that was the only safe thing to do because of buffering issues, but this seems to be implying that careful use of sync calls or something similar would do the trick. Maybe this is just left over from before the problem with the old technique described in the DB_File docs was discovered? Any comments? - Perrin
Re: Problem with Tie::DBI and DBI in modperl
On Mon, 19 Mar 2001, John Mulkerin wrote: I'm trying to use the plain vanilla TicketTool.pm from O'Reilly's mod perl book, Apache Modules with Perl and C. It uses Tie::DBI to create a hash of the mysql connection. When I run just the authentication subroutine with Perl -d "authenticate.pm" it runs fine. Whe I run it as part of the web server, it fails in the connect statement in the Tie:DBI routine. What is the exact error message? Have you tried searching the mailing list archive for that message? Are you using PHP in your server? - Perrin
RE: enable normal SSI for output of mod_perl script
On Sat, 17 Mar 2001, Surat Singh Bhati wrote: Once I generate someoutput or page using my handler, I want to pass it to apache to process the #exec. Apache::SSI does not support #exec and "PerlSSI disabled in DSO build" I am using the DSO mod_perl. Any solution? Apache::SSI does support #exec. It says so right there in the man page. But why #exec? Is there some reason you have to do it that way? It's going to be slow and will crush your machine with forking. If the thing you're exec'ing is a Perl script, you can run it under Apache::Registry and call it with an include. There are lots of discussions on how to do this in the mailing list archives. - Perrin
Re: cgi_to_mod_perl manpage suggestion
On Wed, 14 Mar 2001, Issac Goldstand wrote: I still think that the above line is confusing: It is because mod_perl is not sending headers by itelf, but rather your script must provide the headers (to be returned by mod_perl). However, when you just say "mod_perl will send headers" it is misleading; it seems to indeicate that mod_perl will send "Content-Type: text/html\r\n\r\n" all by itself, and that conversely, to disable that PerlSendHeaders should be Off. Would it help if it said "PerlSendHeader On makes mod_perl act just like CGI with regard to headers"? - Perrin
Re: cgi_to_mod_perl manpage suggestion
On Tue, 13 Mar 2001, Issac Goldstand wrote: The only problem was the PerlSendHeaders option. The first fifty or so times that I read the manpages, I understood that PerlSendHeader On means that mod_perl will SEND HEADERS, and that off meant supply your own... Somehow I figured out (eventually) that this was not so, switched all of my deliberately placed PerlSendHeader Off statesments to a single On statement, and all of my scripts once again work. Um, you're getting me confused now, but PerlSendHeader On means that mod_perl WILL send headers. I think the main problem was the line : By default, mod_perl does not send any headers by itself, however, you may wish to change this: PerlSendHeader On That seems to say that if you want mod_perl to handle headers for you, cange it to On. That's correct. - Perrin
Re: cgi_to_mod_perl manpage suggestion
On Tue, 13 Mar 2001, Andrew Ho wrote: PHUm, you're getting me confused now, but PerlSendHeader On means that PHmod_perl WILL send headers. I recognize this confusion. Most recovering CGI programmers think that "PerlSendHeader On" means that you no longer have to do this in your CGI: print "Content-type: text/html\n\n"; When in fact you still do. The manpage makes it sound like you don't. Perhaps a note to that effect would be helpful. I certainly want newbies to understand the docs, but the man page does say very explicitly "Just as with mod_cgi, PerlSendHeader will not send the MIME type and a terminating double newline. Your script must send that itself..." - Perrin
Re: [OT] Re: mod_perl shared memory with MM
I'm very intrigued by your thinking on locking. I had never considered the transaction based approach to caching you are referring to. I'll take this up privately with you, because we've strayed far off the mod_perl topic, although I find it fascinating. One more suggestion before you take this off the list: it's nice to have both. There are uses for explicit locking (I remember Randal saying he wished File::Cache had some locking support), but most people will be happy with atomic updates, and that's usually faster. Gunther's eXtropia stuff supports various locking options, and you can read some of the reasoning behind it in the docs at http://new.extropia.com/development/webware2/webware2.html. (See chapters 13 and 18.) - why don't you use 'real' constants for $SUCCESS and the like? (use constant) Two reasons, mostly historical, and not necessarily good ones. One, I benchmarked some code once that required high performance, and the use of constants was just slightly slower. Ick. Two, I like the syntax $hash{$CONSTANT}. If I remember correctly, $hash{CONSTANT} didn't work. This may have changed in newer versions of Perl. No, the use of constants as hash keys or in interpolated strings still doesn't work. I tried the constants module in my last project, and I found it to be more trouble than it was worth. It's annoying to have to write things like $hash{CONSTANT} or "string @{[CONSTANT]}". Do you know if Storeable is definitely faster? It is, and it's now part of the standard distribution. http://www.astray.com/pipermail/foo/2000-August/000169.html - Perrin
Re: [OT] Re: mod_perl shared memory with MM
Can I ask why you are not useing IPC::Sharedlight (as its pure C and apparently much faster than IPC::Shareable - I've never benchmarked it as I've also used IPC::Sharedlight). Full circle back to the original topic... IPC::MM is implemented in C and offers an actual hash interface backed by a BTree in shared memory. IPC::ShareLite only works for individual scalars. It wouldn't surprise me if a file system approach was faster than either of these on Linux, because of the agressive caching. - Perrin
Re: mod_perl shared memory with MM
On Sat, 10 Mar 2001, Christian Jaeger wrote: For all of you trying to share session information efficently my IPC::FsSharevars module might be the right thing. I wrote it after having considered all the other solutions. It uses the file system directly (no BDB/etc. overhead) and provides sophisticated locking (even different variables from the same session can be written at the same time). Sounds very interesting. Does it use a multi-file approach like File::Cache? Have you actually benchmarked it against BerkeleyDB? It's hard to beat BDB because it uses a shared memory buffer, but theoretically the file system buffer could do it since that's managed by the kernel. - Perrin
Re: [ANNOUNCE] Cache-Cache-0.03
"Daniel Little (Metrex)" wrote: Along the same lines, how about making SizeAwareMemoryCache as well so that you can specify just how much data you want stored in the cache. Sounds like Joshua Chamas' Tie::Cache module. It provides a size-limited LRU cache. - Perrin
Re: mod_perl shared memory with MM
Christian Jaeger wrote: Yes, it uses a separate file for each variable. This way also locking is solved, each variable has it's own file lock. You should take a look at DeWitt Clinton's Cache::FileCache module, announced on this list. It might make sense to merge your work into that module, which is the next generation of the popular File::Cache module. It's a bit difficult to write a realworld benchmark. It certainly is. Benchmarking all of the options is something that I've always wanted to do and never find enough time for. I've tried to use DB_File before but it was very slow when doing a sync after every write as is recommended in various documentation to make it multiprocess safe. What do you mean with BerkeleyDB, something different than DB_File? BerkeleyDB.pm is an interface to later versions of the Berkeley DB library. It has a shared memory cache, and does not require syncing or opening and closing of files on every access. It has built-in locking, which can be configured to work at a page level, allowing mutiple simultaneous writers. Currently I don't use Mmap (are there no cross platform issues using that?), that might speed it up a bit more. That would be a nice option. Take a look at Cache::Mmap before you start. - Perrin
Berkeley DB at ApacheCon
At the upcoming ApacheCon in April, Bill Hilf and I will be presenting a talk called "Building a Large-Scale E-Commerce Site with Apache and mod_perl." One of the things we'll be covering is our use of Berkeley DB, including some problems we encountered with it and our recommendations on how to use it. If you're interested in using Berkeley DB, this might save you some trouble. (Stas, after the conference you might want to grab the info from our article in the conference handouts and add it to the Berkeley DB section of the Guide. Or I could do it and send it to you.) - Perrin
Re: mod_perl shared memory with MM
I have some preliminary benchmark code -- only good for relative benchmarking, but it is a start. I'd be happy to post the results here if people are interested. Please do. - Perrin
Re: Apache::SpeedLimit
On Tue, 6 Mar 2001, Daniel wrote: Hi, I'm having a bit of trouble using Apache::Speedlimit ... After running for about 5 minutes the handler dies with the following in the logfile: [Tue Mar 6 17:32:07 2001] [error] [Tue Mar 6 17:32:07 2001] null: Munged shared memory segment (size exceeded?) I would suggest that you ditch Apache::Speedlimit, since it uses IPC::Shareable which is known to have some performance issues and is giving you grief about memory size. Do a quick search in the list archives for Randal's Stonehenge::Throttle code. It's fairly easy to adapt it to use cookies and to count hits within a time window rather than CPU. It uses files, and doesn't have to lock them. Quite fast under Linux, and even works over NFS if you need a clustered solution. - Perrin
Re: Passing control from one apache module to another
I am writing an apache perl module which logs HTTP environment variables. This is fine for static content (html, images) but is a problem for dynamic content such as php. Why doesn't Log Format work for you? - Perrin
Re: Duplicate entries in @INC
But when I print all the values of @INC in mod-perl through browser ,I see duplicate entries for my directory.But under CGI, I don't see any What might be the reason? I can think of two possibilities. First, you might be adding /usr/local/apache/lib/perl (or where ever your Apache lives + /lib/perl) to @INC, which mod_perl does automatically. Second, your startup.pl may be running twice because Apache runs the config file twice on startup. Stas posted a message a few days ago about seeing files pulled in from PerlModule/PerlRequire commands running twice on startup, even though they probably shouldn't. See http://forum.swarthmore.edu/epigone/modperl/crachoupro/Pine.LNX.4.30.0102231 [EMAIL PROTECTED] for more info and a workaround. - Perrin
Re: Apache thrashing my swap...
On Wed, 28 Feb 2001, Jason Terry wrote: My problem is that recently I have had some users that are getting impatient and hitting the reload/refresh button OFTEN. In some instances this causes one single person to have over 40 httpd children service JUST them. This causes my server to start thrashing swap... First, put something in place so that your server will never go into swap. I prefer a combination of MaxClients and Apache::SizeLimit. Also, if you haven't moved your images to another server and/or put a proxy server in place, do that. Does anybody on this list know of a way to limit the number of connections apache will allow per IP address (before any reverse lookups would be nice)? If you set a unique cookie, you could limit based on that and fall back to IP address if you don't find a cookie. That will help with the proxy issue. You could adapt one of the existing modules for this purpose, or maybe grab Randal's Stonehenge::Throttle code from the list archives. Be careful. Blocking users is always a dangerous thing to do and may be more trouble than it's worth. You could check the list archives for discussions of how to handle long-running tasks for ideas on interface changes that might solve your problem. - Perrin
Re: mod_perl shared memory with MM
Adi Fairbank wrote: I am trying to squeeze more performance out of my persistent session cache. In my application, the Storable image size of my sessions can grow upwards of 100-200K. It can take on the order of 200ms for Storable to deserialize and serialize this on my (lousy) hardware. I'm looking at RSE's MM and the Perl module IPC::MM as a persistent session cache. Right now IPC::MM doesn't support multi-dimensional Perl data structures, nor blessed references, so I will have to extend it to support these. Is there a way you can do that without using Storable? If not, maybe you should look at partitioning your data more, so that only the parts you really need for a given request are loaded and saved. I'm pleased to see people using IPC::MM, since I bugged Arthur to put it on CPAN. However, if it doesn't work for you there are other options such as BerkeleyDB (not DB_File) which should provide a similar level of performance. - Perrin
Re: [OT] (apache question) Working around MaxClients?
I have a high traffic website (looks like 200 GB output per month, something around 10-20 hits per day) hosted on a commercial service. The service does not limit my bandwidth usage, but they limit the number of concurrent Apache process that I can have to 41. This causes the server to delay accepting new connections during peak times. That seems pretty arbitrary. They use that instead of some kind of memory or CPU cap? My account is a "virtual server"; what this means is that I have access to the Apache httpd.conf files and can restart the Apache daemon, but do not have the priviledge to bind a program to port 80 (so I can't put thttpd on port 80). That rules out some obvious solutions like lingerd and squid (which I think uses a select loop). Sounds like they've made it so there's nothing you can do except try to server your content faster. You could look at Apache::Compress. - Perrin
Re: cron for mod_perl?
On Thu, 15 Feb 2001, Stas Bekman wrote: I might be barking at the wrong tree, but why cron? Why don't you use at(1). And there's a CPAN module for it: Schedule::At. It claims to be cross-platform, and I believe NT has a version of at(1). - Perrin
Re: cron for mod_perl?
On Thu, 15 Feb 2001, Matt Sergeant wrote: Its just a convenience thing. I've wanted to be able to do this too, for example to have emails go off at a particular interval. So yes, it can be done as cron + URI, but I'm just jealous of AOLServer's ability to do it all in one. This is especially important for a shrink-wrapped type application, where asking people to install a crontab entry is just another pain in the installation process (note that cron might be different on different OS's, and so might the shell be, so this is a real problem for some people - whereas if it were in Apache we'd know the platform). Maybe we should add process scheduling into Apache, and a file system, and a window manager, and... Okay, I'm being silly, and there are times when duplication is necessary, but cron is such a well-established way of solving this problem that anything else sounds strange. The original post didn't say that the goal was to modify the scheduled jobs dynamically from mod_perl, and that does add a new wrinkle. I still think a good Perl interface to cron would be more obvious and more reliable. - Perrin
Re: cron for mod_perl?
Huh? Why would you call it if there's nothing to do? Are you thinking you'll write a cron-ish task/timing spec for your Perl app and just use the cron triggers as a constant clock? Yes, exactly. My plan is to have a table with the tasks in my database, and check expired tasks in a cleanup handler. I'll have to lock the table, so that only one process does that. I'll also query the database only every so often, not at every request cleanup. The more hits I get, the more accurate the "cron" will be, but I think I will use a cron trigger a request to have a minimum level of accuracy (could be every half-hour, just to make sure we're never late by more than that, or whatever is appropriate). Anything better? Well, by frequently hitting your web server and having it look in the database to decide to do nothing, you're putting a lot of unnecessary stress on your server. What's wrong with making individual cron jobs that call individual URLs (think of it as RPC) to cause actions on your web server exactly when you want them done? Doesn't that sound a whole lot simpler? - Perrin
Re: cron for mod_perl?
On Wed, 14 Feb 2001, Pierre Phaneuf wrote: I guess two persons "simpler" aren't always the same: I find it easier laying out a table and querying it than hacking something to fiddle with my crontab safely. As far as I know, crontab -e is perfectly safe. - Perrin
Re: cron for mod_perl?
On Tue, 13 Feb 2001, Pierre Phaneuf wrote: Well, if I call the "check for things to do" URI every minute, then I'll be just fine. Many times, I'll just check and find nothing to do Huh? Why would you call it if there's nothing to do? Are you thinking you'll write a cron-ish task/timing spec for your Perl app and just use the cron triggers as a constant clock? - Perrin
Re: PerlRequire
On Mon, 12 Feb 2001, Aaron Schlesinger wrote: I have a line in my httpd.conf: PerlRequire /path/to/startup.pl In startup.pl I have this line: use lib '/path/to/module'; This is not being added to my @INC like it should. Any thoughts? How do you know it isn;t being added? Try putting this in your httpd.conf, right after the PerlRequire: Perl print join (':', @INC); /Perl - Perrin
Re: [Templates] Re: ANNOUNCE: OpenInteract Web Application Server
On Thu, 8 Feb 2001, L.M.Orchard wrote: Now, if only I could get back to un-mothballing Iaijutsu/Iaido and do Zope the right way under perl... :) When I first looked at OI, I was thinking that it has a lot of the plumbing (O/R mapping, security model, application model) covered and you could probably write something like Iaido as an object publisher for it. I know Iaido has quite a few whizbang features that aren't currently in OI, but I think they could be fit into it, to the benefit of all. - Perrin
Re: ANNOUNCE: OpenInteract Web Application Server
On Thu, 8 Feb 2001, Stephane Bortzmeyer wrote: On Tuesday 6 February 2001, at 21 h 57, the keyboard of Chris Winters [EMAIL PROTECTED] wrote: I'm jazzed to announce the public release of OpenInteract, an extensible web application framework using mod_perl and the Template Toolkit as its core technologies. Anyone compared it to Zope http://www.zope.org/? I'm hesitating. Zope has a built-in concept of folders that allows you to use it as a sort of lame content management thing out of the box, i.e. you can edit pages and site structure through a web browser. (And there are other protocols like FTP that are supposed to work, although I haven't tried them.) OpenInteract doesn't seem to have an equivalent. Zope provides its own file-based database and indexer, while OpenInteract expects you to use an external database of some kind. OpenInteract has pretty solid-looking documentation. The Zope docs are a disaster, although a forthcoming book may improve that situation. Some of Zope's most interesting ideas - like Z Classes, a way to define object types at runtime through a web interface - seem cumbersome to work with or have odd restrictions. OpenInteract has no equivalent that I could see. In short, Zope wants to be more, but currently is difficult to figure out. That could be just my Perl experience, but I understood more of OpenInteract in half an hour than I did with Zope after several tries over the last few years. - Perrin
Re: [Templates] Re: object not being destroyed in a TemplateToolkit-basedhandler
On Thu, 8 Feb 2001, Vivek Khera wrote: Ok... Upgrade to "Apache/1.3.17 (Unix) mod_perl/1.25_01-dev" fixed the object destroy issue. Yay! Old versions were Apache 1.3.14 and mod_perl 1.24_02-dev. Well, that is odd since I'm running 1.3.12 and 1.24_01, but you never know what evils might be fixed by a clean install. - Perrin
Re: object not being destroyed in a TemplateToolkit-based handler
On Wed, 7 Feb 2001, Vivek Khera wrote: Ok... here's a mini-plugin that exhibits this behavior, and a command line script for it. This all looks like it should work, and the plugin object should get destroyed. Until someone finds the source of the problem, you could work around it by keeping your session reference in $r-pnotes instead of the actual plugin object. Then the session will get destroyed at the end of the request even if the plugin object doesn't. - Perrin
Re: object not being destroyed in a TemplateToolkit-based handler
On Wed, 7 Feb 2001, Vivek Khera wrote: Did you (or anyone else) reproduce the non-destroy of my mini-plugin? I didn't actually run it; just poked through the code. I'd like to at least know if I'm doing something wrong in mod_perl. I find it disconcerting to have my plugin objects sitting around unused and unreaped, aka, memory leakage. To find out if this is a mod_perl probelm or not, try makiing your command line script call $tt-process twice in a row. If the object gets destroyed twice, this is mod_perl-related. Otherwise, it's a TT problem and officially [OT] for the mod_perl list. - Perrin
Re: object not being destroyed in a TemplateToolkit-based handler
On Wed, 7 Feb 2001, Vivek Khera wrote: Did you (or anyone else) reproduce the non-destroy of my mini-plugin? I'd like to at least know if I'm doing something wrong in mod_perl. I find it disconcerting to have my plugin objects sitting around unused and unreaped, aka, memory leakage. Okay, I just tried the code you posted under mod_perl and it worked fine. I changed a couple of lines having to do with locations and package names, and I commented out the PRE_PROCESS/POST_PROCESS stuff. The plugin object reported being destroyed. - Perrin
Re: object not being destroyed in a TemplateToolkit-based handler
On Tue, 6 Feb 2001, Vivek Khera wrote: However, at the end of the template processing, the object is not destroyed; that is, the DESTROY() method is never called, and therefore the tied hash never gets untied and Apache::Session::MySQL doesn't get a chance to write the data back to the store. Hmmm... If I'm reading the code correctly, what's supposed to happen is that the stash (where your plugin instance lives) gets localized when you call $tt-process() and de-localized at the end, which should result in anything you added to it (with your USE s = session) getting DESTROY-ed. Maybe there's a bug in this process somewhere? Are you keeping a copy of the stash anywhere else? Are you doing something in your plugin's load() method that involves a reference to the session object? That could cause this kind of problem. Maybe you could post a bit more of your plugin code. To make sure your stash is getting cleared, try putting some other object into it (by passing it to process) and see if it gets destroyed. If it does, then there's something about the way your plugin is written that's causing the problem. If not, TT has a problem. - Perrin
Re: object not being destroyed in a TemplateToolkit-based handler
On Tue, 6 Feb 2001, darren chamberlain wrote: Vivek Khera ([EMAIL PROTECTED]) said something to this effect on 02/06/2001: However, at the end of the template processing, the object is not destroyed; that is, the DESTROY() method is never called, and therefore the tied hash never gets untied and Apache::Session::MySQL doesn't get a chance to write the data back to the store. You aren't clear about the scope of $tt; it sounds like a package level global, if it's being destroyed when the children exit. How about creating a PerlCleanupHandler to explicit destroy $tt? No, don't do that. You need $tt to stick around, so that you can get the large speed increase from using the in-memory cache. - Perrin
Re: ANNOUNCE: OpenInteract Web Application Server
Chris Winters wrote: I'm jazzed to announce the public release of OpenInteract, an extensible web application framework using mod_perl and the Template Toolkit as its core technologies. Hi Chris, I've been reading the docs for the last couple of days and it looks very interesting. It's great to see a well-documented open source project. I have a couple of specific questions, which I guess are really about SPOPS more than OpenInteract. First, how much control do I have over what gets loaded when in objects with dependencies? For example, if I have an object with relationships to other objects, some of which are expensive to fetch from the database, can I defer certain parts from loading until I actually use them? Second, how hard is it to override the default load/save stuff in a SPOPS object in order to do some fancy SQL? I've had situations before with O/R mappers where I want to use some more complex SQL for efficiency reasons (optimizer hints, etc.) or to load a set of objects at once (like a tree structure). Is that possible with SPOPS? Finally, if I'm using a SQL database, what support is provided for evolving the data model? Do I have to change my schema and write conversion scripts every time I change an object attribute, or does SPOPS try to use some sort of "generic" schema? And just out of curiosity, are you familiar with any of the similar projects that others have worked on, like Iaido (formerly Iaijutsu) or Jellybean? - Perrin
Re: Runaways
Robert Landrum wrote: I have some very large httpd processes (35 MB) running our application software. Every so often, one of the processes will grow infinitly large, consuming all available system resources. After 300 seconds the process dies (as specified in the config file), and the system usually returns to normal. Is there any way to determine what is eating up all the memory? I need to pinpoint this to a particular module. I've tried coredumping during the incident, but gdb has yet to tell me anything useful. First, BSD::Resource can save you from these. It will do hard limits on memory and CPU consumption. Second, you may be bale to register a handler for a signal that will generate a stack trace. Look at Devel::StackTrace (I think) for how to do it. - Perrin
Re: Passing data among handlers
Drew Taylor wrote: I have a slightly different twist on this question. We run Registry scripts on our site for debugging purposes. I would love to have a module for saving variables/data structures on a per-request basis (like the current Apache notes), but internally using pnotes under mod_perl, and some other mechanism (package vars like I'm using now?) under everything else. We do that. It's pretty simple. Just make get and set subs (or methods) that check for exists $ENV{'MODPERL'} and use pnotes or a global hash depending owhich they are running under. - Perrin
Re: Rate limiting in Apache
[EMAIL PROTECTED] wrote: I'm interested in doing rate-limiting with Apache. Basically, I want to give Apache a target bitrate to aim at. When writing to one user, it writes as close to bitrate as the user/network can suck it down. When writing to two users (two connections), it writes to each connection at as close to bitrate/2 as possible... and so on. I've heard this can be controlled by mod_perl. Can anyone point me to some examples? You might want to use mod_throttle for this. There are some similar things in mod_perl if you poke around CPAN for them. I can vouch for the one Randal wrote, but it's not exactly what you're asking for. - Perrin
Re: [OT] Design Patterns in mod_perl apps?
Gunther Birznieks wrote: GoF did not introduce Model-View-Controller architecture. But it is discussed in Wiley's "A System of Patterns: Pattern-Oriented Software Architecture". MVC is frequently used in mod_perl apps. For example, see Apache::PageKit. - Perrin
Re: Runaways
Dave Rolsky wrote: On Mon, 5 Feb 2001, Perrin Harkins wrote: First, BSD::Resource can save you from these. It will do hard limits on memory and CPU consumption. Second, you may be bale to register a handler for a signal that will generate a stack trace. Look at Devel::StackTrace (I think) for how to do it. Nope, that's not it. I wrote that one and it doesn't talk about that at all. I meant "for how to generate a stacktrace". Using it with a singal handler was demonstrated on this list about two weeks ago, but I can't recall who did it. It was someone trying to track down a segfault. - Perrin
Re: Apache::SizeLimit MAX UNSHARED patch
Joshua Chamas wrote: Hey, Per Perrin Harkin's advice, and my client's consent, I hacked up Apache::SizeLimit to support MAX_PROCESS_UNSHARED_SIZE config, where instead of limiting by the apparent process size, one limits by the amount of unshared memory being used. I actually did submit our patch for this a couple of weeks ago. I sent it to the guy who's name was in the module, and he told me sent it along to the proper authorities. I'll send it to you if you'd like, and then you could see if there are any extras in it that would help your patch. I did patch all the documentation in our version to include the new option, so that might be a useful addition to yours. - Perrin
Re: Text::Template and passing in variables; going bug nuts!
Where I'm getting hosed is that %config and %session have data I need visible to the Text::Template objects themselves. I've RTFM'ed until my eyes are pink, and I see no options short of copying variables wholesale into another Package, but then I still can't get at them and "use strict" can I? Text::Template has a flexible interface. Put everything into a specific package, or use the HASH option to fill_in(). Use references, not copies. Your sample code has some stuff in it that looks a little scary to me. Do you really want to make %config, %dispatcher, $page, $frame, $command, %session, and %templates into class variables? Maybe instance variables would make more sense. (These are closures variables as well, so they'll keep their values from one request to the next.) And building methods that change class variables instead of actually returning something is kind of obfuscated. But maybe I just don't understand what you're doing from these small snippets. - Perrin
RE: pseudo-hashes? was: Data structure question
On Tue, 23 Jan 2001, John Hughes wrote: I had already reached the same conclusion after I saw that everyone would have to remember to say "my Dog $spot;" every time or the whole thing falls apart. Falls apart? How? If you forget the "Dog" part somewhere, it's slower than a normal hash. If you want something reasonably close, you could do what a lot of the Template Toolkit code does and use arrays with constants for key names. Here's an example: Yes but then you get neither compile time (my Dog $spot) nor run time (my $spot) error checking. As Matt pointed out, you get compile time errors if you use an undefined constant as a key. You can also do this sort of thing with hashes, like this: use strict; my $bar = 'bar' $foo{$bar}; If you type $foo{$barf} instead, you'll get an error. How are you going to debug the times you use a constant defined for one structure to index another? Different classes would be in different packages. Oh, do it all through accessor functions. That'll be nice and fast won't it. Well, I thought we were talking about data structures to use for objects. A few months back, when making design decisions for a big project, I benchmarked pseudo-hashes on 5.00503. They weren't significantly faster than hashes, and only 15% smaller. I figured they were only worth the trouble if we were going to be making thousands of small objects, which is a bad idea in the first place. So, we opted for programmer efficiency and code readability and wrote hashes when we meant hashes. Of course, since this stuff is OO code, we could always go back and change the internal implementation to pseudo-hashes if it looked like it would help. If pseudo-hashes work for you, go ahead and use them. If it ain't broke... - Perrin
Re: pseudo-hashes? was: Data structure question
On Mon, 22 Jan 2001 [EMAIL PROTECTED] wrote: (section 4.3, pp 126-135) I hadn't heard about pseudo-hashes. I now desire a data structure with non-numeric keys, definable iteration order, no autovivification, and happy syntax. (And, of course, fast-n-small :-) Having Conway's blessing is nice Pseudo-hashes do not have Conway's blessing. We hired him to do a tutorial for our engineers a few omnths back, and he railed about how disappointing pseudo-hashes turned out to be and why no one should ever use them. I had already reached the same conclusion after I saw that everyone would have to remember to say "my Dog $spot;" every time or the whole thing falls apart. If you want something reasonably close, you could do what a lot of the Template Toolkit code does and use arrays with constants for key names. Here's an example: package Dog; use constant NAME = 1; use constant ID = 2; sub new { my $self = []; $self-[ NAME ] = 'spot'; $self-[ ID ] = 7; return bless $self; } Or something like that, and make accessors for the member data. I think there are CPAN modules which can automate this for you if you wish. - Perrin
Re: dir_config at startup: I know what doesn't work, so what does?
"Christopher L. Everett" wrote: So what I'd like to know is: is there any way of picking up configuration info from the httpd-perl.conf at server startup? If you don't need to have different configurations for each virtual host or directory, you could just use globals. Perl $MyConfig::DBI_DSN = 'foobar'; /Perl - Perrin
Re: Apache::Session::DB_File and open sessions
Todd Finney wrote: The one-sentence version of my question is: Is there a problem with tying a session twice during two different HeaderParserHandlers, as long as your doing the standard cleanup stuff (untie | make_modified) in each? It seems like the answer should be no unless there's some kind of bug, but I don't understand why you're doing it this way. Why don't you just put a reference to the %session hash in pnotes and use it in the second handler, instead of putting the ID in and re-creating it? That should be considerably more efficient. - Perrin
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory
Sam Horrocks wrote: say they take two slices, and interpreters 1 and 2 get pre-empted and go back into the queue. So then requests 5/6 in the queue have to use other interpreters, and you expand the number of interpreters in use. But still, you'll wind up using the smallest number of interpreters required for the given load and timeslice. As soon as those 1st and 2nd perl interpreters finish their run, they go back at the beginning of the queue, and the 7th/ 8th or later requests can then use them, etc. Now you have a pool of maybe four interpreters, all being used on an MRU basis. But it won't expand beyond that set unless your load goes up or your program's CPU time requirements increase beyond another timeslice. MRU will ensure that whatever the number of interpreters in use, it is the lowest possible, given the load, the CPU-time required by the program and the size of the timeslice. You know, I had brief look through some of the SpeedyCGI code yesterday, and I think the MRU process selection might be a bit of a red herring. I think the real reason Speedy won the memory test is the way it spawns processes. If I understand what's going on in Apache's source, once every second it has a look at the scoreboard and says "less than MinSpareServers are idle, so I'll start more" or "more than MaxSpareServers are idle, so I'll kill one". It only kills one per second. It starts by spawning one, but the number spawned goes up exponentially each time it sees there are still not enough idle servers, until it hits 32 per second. It's easy to see how this could result in spawning too many in response to sudden load, and then taking a long time to clear out the unnecessary ones. In contrast, Speedy checks on every request to see if there are enough backends running. If there aren't, it spawns more until there are as many backends as queued requests. That means it never overshoots the mark. Going back to your example up above, if Apache actually controlled the number of processes tightly enough to prevent building up idle servers, it wouldn't really matter much how processes were selected. If after the 1st and 2nd interpreters finish their run they went to the end of the queue instead of the beginning of it, that simply means they will sit idle until called for instead of some other two processes sitting idle until called for. If the systems were both efficient enough about spawning to only create as many interpreters as needed, none of them would be sitting idle and memory usage would always be as low as possible. I don't know if I'm explaining this very well, but the gist of my theory is that at any given time both systems will require an equal number of in use interpreters to do an equal amount of work and the diffirentiator between the two is Apache's relatively poor estimate of how many processes should be available at any given time. I think this theory matches up nicely with the results of Sam's tests: when MaxClients prevents Apache from spawning too many processes, both systems have similar performance characteristics. There are some knobs to twiddle in Apache's source if anyone is interested in playing with it. You can change the frequency of the checks and the maximum number of servers spawned per check. I don't have much motivation to do this investigation myself, since I've already tuned our MaxClients and process size constraints to prevent problems with our application. - Perrin
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory
On Fri, 19 Jan 2001, Sam Horrocks wrote: You know, I had brief look through some of the SpeedyCGI code yesterday, and I think the MRU process selection might be a bit of a red herring. I think the real reason Speedy won the memory test is the way it spawns processes. Please take a look at that code again. There's no smoke and mirrors, no red-herrings. I didn't mean that MRU isn't really happening, just that it isn't the reason why Speedy is running fewer interpeters. Also, I don't look at the benchmarks as "winning" - I am not trying to start a mod_perl vs speedy battle here. Okay, but let's not be so polite about things that we don't acknowledge when someone is onto a better way of doing things. Stealing good ideas from other projects is a time-honored open source tradition. Speedy does not check on every request to see if there are enough backends running. In most cases, the only thing the frontend does is grab an idle backend from the lifo. Only if there are none available does it start to worry about how many are running, etc. Sorry, I had a lot of the details about what Speedy is doing wrong. However, it still sounds like it has a more efficient approach than Apache in terms of managing process spawning. You're correct that speedy does try not to overshoot, but mainly because there's no point in overshooting - it just wastes swap space. But that's not the heart of the mechanism. There truly is a LIFO involved. Please read that code again, or run some tests. Speedy could overshoot by far, and the worst that would happen is that you would get a lot of idle backends sitting in virtual memory, which the kernel would page out, and then at some point they'll time out and die. When you spawn a new process it starts out in real memory, doesn't it? Spawning too many could use up all the physical RAM and send a box into swap, at least until it managed to page out the idle processes. That's what I think happened to mod_perl in this test. If you start lots of those on a script that says 'print "$$\n"', then run the frontend on the same script, you will still see the same pid over and over. This is the LIFO in action, reusing the same process over and over. Right, but I don't think that explains why fewer processes are running. Suppose you start 10 processes, and then send in one request at a time, and that request takes one time slice to complete. If MRU works perfectly, you'll get process 1 over and over again handling the requests. LRU will use process 1, then 2, then 3, etc. But both of them have 9 processes idle and one in use at any given time. The 9 idle ones should either be killed off, or ideally never have been spawned in the first place. I think Speedy does a better job of preventing unnecessary process spawning. One alternative theory is that keeping the same process busy instead of rotating through all 10 means that the OS can page out the other 9 and thus use less physical RAM. Anyway, I feel like we've been putting you on the spot, and I don't want you to feel obligated to respond personally to all the messages on this thread. I'm only still talking about it because it's interesting and I've learned a couple of things about Linux and Apache from it. If I get the chance this weekend, I'll try some tests of my own. - Perrin
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory
On Wed, 17 Jan 2001, Sam Horrocks wrote: If in both the MRU/LRU case there were exactly 10 interpreters busy at all times, then you're right it wouldn't matter. But don't confuse the issues - 10 concurrent requests do *not* necessarily require 10 concurrent interpreters. The MRU has an affect on the way a stream of 10 concurrent requests are handled, and MRU results in those same requests being handled by fewer interpreters. On a side note, I'm curious about is how Apache decides that child processes are unused and can be killed off. The spawning of new processes is pretty agressive on a busy server, but if the server reaches a steady state and some processes aren't needed they should be killed off. Maybe no one has bothered to make that part very efficient since in normal circusmtances most users would prefer to have extra processes waiting around than not have enough to handle a surge and have to spawn a whole bunch. - Perrin
Re: killing of greater than MaxSpareServers
On Wed, 17 Jan 2001, ___cliff rayman___ wrote: here is an excerpt from httpd.h: Good reading. Thanks. It looks as if Apache should find the right number of servers for a steady load over time, but it could jump up too high for a bit when the load spike first comes in, pushing into swap if MaxClients is not configured correctly. That may be what Sam was seeing. - Perrin
Re: mod_perm and Java servlets
I've heard mod_perm costs a lot more than its worth. There was an open-source clone called mod_home_perm but it wasn't very successful. Some people say you should skip it altogether and just use mod_hat. On Thu, 18 Jan 2001, Terry Newnham wrote: My boss has asked me to set up a web server on Solaris 8 with mod_perl and (if possible) Java servlet capabilities as well. Has anybody done this ? Any issues ? None that I know of, except that you really don't want the additional memory overhead of mod_perl in a process that isn't using mod_perl. You might save some memory by having a separate server that runs just mod_perl, and having your jserv (or whatever) server send requests for mod_perl apps to it using mod_proxy. See the mod_perl Guide for more info on using a proxy with mod_perl. - Perrin
Re: killing of greater than MaxSpareServers
On Wed, 17 Jan 2001, ___cliff rayman___ wrote: i and others have written on the list before, that pushing apache children into swap causes a rapid downward spiral in performance. I don't think that MaxClients is the right way to limit the # of children. i think MaxSpareCoreMemory would make more sense. You could set this to 1K if your server was designated for Apache only, or set it to a higher value if it were a multipurpose machine. I've thought about that too. The trick is, Apache would need to know things about your application to do that right. It would need to know how big your processes were likely to be and how big they could get before they die. Otherwise, it has no way of knowing whether or not there's enough room for another process. A combination of Apache::SizeLimit and a dynamically changing MaxClients could possibly accomplish this, but you wouldn't want to run it too close to the edge since you don't want to have to axe a process that's in the middle of doing something just because it got a little too big (i.e. no hard limits on per-process memory usage). You can't change MaxClients while the server is running, can you? - Perrin
Re: Upgrading mod_perl on production machine (again)
The RPM/tarball option worries me a bit, since if I do forget a file, then I'll be down for a while, plus I don't have another machine of the same type where I can create the tarball. There's no substitute for testing. If it's really important to have a very short down time, you need a similar machine where you can test a new package. Short of that, the symlink suggestions people have made are probably the best you can do. - Perrin
Re: With high request rate, server stops responding with load zero
On Tue, 16 Jan 2001, Honza Pazdziora wrote: The machines are alright memorywise, they seem to be a bit slow on CPU, however what bothers me is the deadlock situation to which they get. No more slow crunching, they just stop accepting connections. I've only seen that happen when something was hanging them up, like running out of memory or waiting for a database resource. Are you using NFS by any chance? Is there a way to allow a lot of children to be spawned but limit the number of children that serve requests? I don't think you want that. If the server is busy, Apache will spawn more as soon as it can. Of course PerlRunOnce is a huge liability. Getting rid of that would surely help a lot. - Perrin
Re: Apache::DBI type functionality but per-request
On Mon, 15 Jan 2001, Vivek Khera wrote: I tend to write my apps in a modular fashion, so that each module connects to the database and fetches its data and then disconnects. Often, a program will require data from several modules resulting in a lot of wasted connect/disconnect ops. Apache::DBI does solve this issue, but I don't want nor need to keep lingering connections for a lightly-used application. (The DB is not used for the majority of hits to the site.) I use a singleton for the database connection: sub get_dbh { my $dbh = Apache-request()-pnotes('dbh'); if (!$dbh) { $dbh = _new_database_connection(); Apache-request()-pnotes('dbh', $dbh); } return $dbh; } Using pnotes is better than using a global, because mod_perl will make sure it gets cleaned up after the request even if something goes wrong. You can use Apache::DBI with this or not, without changing the code. My guess is that my handler will clearnup the cache before Apache::DBI's handler gets a chance to auto-rollback. You might get some errors about undefined methods and such from that. - Perrin
Re: setting lib for mod_perl installation
On Mon, 15 Jan 2001, Dave Armstrong wrote: I just moved from dedicated to virtual hosting sigh, and was wondering how to configure mod_perl to install the modules to a private lib, outside of @INC. http://perl.apache.org/guide/install.html#Installing_Perl_Modules_into_a_D - Perrin
Re: Specific limiting examples (was RE: Apache::SizeLimit for unsharedRAM ???)
On Mon, 15 Jan 2001, Ask Bjoern Hansen wrote: I tend to set the number to N number of requests. If each httpd child needs to be forked every 1 requests that's pretty insignificant and it can save you from some blowups. The reason I like using SizeLimit instead of a number of requests is that it won't kill off processes when there isn't a problem. It also catches situations where you occasionally do something that raises the size of a process significantly by killing those off sooner. - Perrin
Re: How to recognize server shutdown?
but it's a bummer that the parent doesn't run END blocks. Will it run cleanup handlers? Cleanup handlers are run by child processes. What it has to do with parent? Or do I miss something? I meant "is there a way to run a cleanup handler in the parent after it's work is done?", but I don't see one. Dave says the END block trick worked for him, so maybe it only fails under certain circumstances. - Perrin
Re: Specific limiting examples (was RE: Apache::SizeLimit for unsharedRAM ???)
On Thu, 11 Jan 2001, Rob Bloodgood wrote: Second of all, with the literally thousands of pages of docs necessary to understand in order to be really mod_perl proficient Most of the documentation is really reference-oriented. All the important concepts in mod_perl performance tuning fit in a few pages of the guide. I mean, 1GB is a lot of ram. It's all relative. If you have significant traffic on your site, 1GB RAM might not be nearly enough. And finally, I was hoping to prod somebody into posting snippets of CODE and httpd.conf that describe SPECIFIC steps/checks/modules/configs designed to put a reasonable cap on resources so that we can serve millions of hits w/o needing a restart. I think you're making this much harder than it needs to be. It's this simple: MaxClients 30 PerlFixupHandler Apache::SizeLimit Perl use Apache::SizeLimit; # sizes are in KB $Apache::SizeLimit::MAX_PROCESS_SIZE = 3; $Apache::SizeLimit::CHECK_EVERY_N_REQUESTS = 5; /Perl If you're paranoid, you can throw BSD::Resource in the mix to catch things like infinite loops in your code. None of this will make your code faster or your server bigger. It will just prevent it from going into swap. Having too much traffic can still hose your site in lots of other ways that have nothing to do with swap and everything to do with the design of your application and the hardware you run it on, but there's nothing mod_perl-specific about those issues. - Perrin
Re: Specific limiting examples (was RE: Apache::SizeLimit for unsharedRAM ???)
On Thu, 11 Jan 2001 [EMAIL PROTECTED] wrote: I think you're making this much harder than it needs to be. It's this simple: MaxClients 30 PerlFixupHandler Apache::SizeLimit Perl use Apache::SizeLimit; # sizes are in KB $Apache::SizeLimit::MAX_PROCESS_SIZE = 3; $Apache::SizeLimit::CHECK_EVERY_N_REQUESTS = 5; /Perl This is just like telling an ISP that they can only have 60ish dial in lines for modems because that could theoreticly fill their T1. Even though they would probably hardly even hit 50% if they only had 60 modems for a T1. The idea that any process going over 30 megs should be killed is probably safe. The solution though is only really valid if our normal process is 29 megs. Otherwise we are limiting each system to something lower then it produce. It's a compromise. Running a few less processes than you could is better than risking swapping, because your service is basically gone when you hit swap. (Note that this is different from your ISP example, because the ISP could still provide some service, albeit with reduced bandwidth, after maxing out it's T1.) The numbers I put in here were random, but in a real system you would adjust this according to your expected process size. Even a carefully coded system will leak over time, and I think killing off children after 1MB of growth would probably be too quick on the draw. Child processes have a significant startup cost and there's a sweet spot between how big you let the processes get and how many you run which you have to find by testing with a load tool. It's different for different applications. It would be nice if Apache could dynamically decide how many processes to run at any given moment based on available memory and how busy the server is, but in practice the best thing I've found is to tune it for the worst case scenario. If you can survive that, the rest is cake. Then you can get on to really hard things, like scaling your database. - Perrin
Re: How to recognize server shutdown?
On Thu, 11 Jan 2001, Doug MacEachern wrote: of course, there is such a "trick" [EMAIL PROTECTED]">http://forum.swarthmore.edu/epigone/modperl/thandflunjimp/[EMAIL PROTECTED] Documentation patch attached. - Perrin 1039a1040,1046 Cleanup functions registered in the parent process (before forking) will run once when the server is shut down: #PerlRequire startup.pl warn "parent pid is $$\n"; Apache-server-register_cleanup(sub { warn "server cleanup in $$\n"});
Re: How to recognize server shutdown?
On Wed, 10 Jan 2001, Dave Rolsky wrote: Is there any way to distinguish between a child being shutdown (say maxrequests has been exceeded) versus all of Apache going down (kill signal sent to the original process or something). Register an END block in your startup.pl, and have it check it's PID to see if it's the parent. - Perrin
Re: Too many connections with DBI
On Wed, 10 Jan 2001, Scott Alexander wrote: It really peaked at 14:38:41 and then in the error_log Ouch! malloc failed in malloc_block() DBI-connect failed: Too many connections at /systems/humakpro/lib/library.pm line 213 [Wed Jan 10 14:38:41 2001] [error] Can't call method "prepare" without a package or object reference at /syst$ It looks like your real problem is that you ran out of memory. Have a look at the information in the guide about MaxClients and Apache::SizeLimit. You may not have a DBI problem at all. - Perrin
Re: How to recognize server shutdown?
On Thu, 11 Jan 2001, Stas Bekman wrote: the parent process doesn't run the END block. Randal's solution is probably better, but it's a bummer that the parent doesn't run END blocks. Will it run cleanup handlers? - Perrin
Re: Apache::SizeLimit for unshared RAM ???
What I would like to see though is instead of killing the child based on VmRSS on Linux, which seems to be the apparent size of the process in virtual memory RAM, I would like to kill it based on the amount of unshared RAM, which is ultimately what we care about. We added that in, but haven't contributed a patch back because our hack only works on Linux. It's actually pretty simple, since the data is already there on Linux and you don't need to do any special tricks with remembering the child init size. If you think it would help, I'll try to get an okay to release a patch for it. This is definitely a better way to do it than by setting max size or min shared size. We had a dramatic improvement in process lifespan after changing it. - Perrin
Re: Apache::SizeLimit for unshared RAM ???
On Tue, 9 Jan 2001, Joshua Chamas wrote: Perrin Harkins wrote: We added that in, but haven't contributed a patch back because our hack only works on Linux. It's actually pretty simple, since the data is already there on Linux and you don't need to do any special tricks with remembering the child init size. If you think it would help, I'll try to get an okay to release a patch for it. This is definitely a better way to do it than by setting max size or min shared size. We had a dramatic improvement in process lifespan after changing it. I would like to see this, but how is it better than the min shared size of Apache::GTopLimit It's like this: What you want to control is the maximum REAL memory that each process will take. That's not max size or min shared, it's max unshared. If you try to control this using the traditional max size and min shared settings, processes often get killed too soon because it's hard to predict how much of the max size will be shared in any given child. Doing it this way also means you never have to adjust the settings when you add in or remove modules. The thing you care about - how much actual RAM is used perprocess - is constant. On the other hand, it seems nice to NOT HAVE to install libgtop for this feature, as Apache::SizeLimit is just a raw perl module. That's the main drawback to GTopLimit. - Perrin
Re: dynamic cache allocation
On Tue, 9 Jan 2001, Elman Vagif Abdullaev wrote: Does anyone know if there is a module that enables dynamic cache allocation for apache web server on the proxy? "Dynamic cache allocation" could mean anything. Can you be more specific? - Perrin
RE: Apache::SizeLimit for unshared RAM ???
On Tue, 9 Jan 2001, Rob Bloodgood wrote: I have a machine w/ 512MB of ram. unload the webserver, see that I have, say, 450MB free. So I would like to tell apache that it is allowed to use at most 425MB. I was thinking about that at some point too. The catch is, different applications have different startup costs per child. If, for example, each child ends up caching a bunch of stuff in RAM, compiling some templates, etc. you may get better performance by running a lower MaxClients and letting each child use more unshared RAM, so that they will live longer. On the other hand, some apps have very low ramp up per child, and don't cache much of anything except the RAM allocated for lexical variables. Those might scale better by running more clients and keeping them smaller. You kind of have to try it to know. The only drawback of per-process limiting is that your server could be performing better when fewer than MaxClients processes are running. It will be killing off child processes when it isn't really necessary because you're miles from MaxClients. Not that big of a deal, but unfortunate. because then all of your hard work before goes RIGHT out the window, and I'm talking about a 10-15 MB difference between JUST FINE and DEATH SPIRAL, because we've now just crossed that horrible, horrible threshold of (say it quietly now) swapping! shudder That won't happen if you use a size limit and MaxClients. The worst that can happen is processes will be killed too quickly, which will drive the load up. Yes, that would be bad, but probably not as bad as swapping. - Perrin
RE: Apache::SizeLimit for unshared RAM ???
On Tue, 9 Jan 2001, Rob Bloodgood wrote: OK, so my next question about per-process size limits is this: Is it a hard limit??? As in, what if I alloc 10MB/per and every now then my one of my processes spikes to a (not unreasonable) 11MB? Will it be nuked in mid process? Or just instructed to die at the end of the current request? It's not a hard limit, and I actually only have it check on every other request. We do use hard limits with BSD::Resource to set maximums on CPU and RAM, in case something goes totally out of control. That's just a safety though. - Perrin
RE: Apache::SizeLimit for unshared RAM ???
On Tue, 9 Jan 2001, Rob Bloodgood wrote: It's not a hard limit, and I actually only have it check on every other request. We do use hard limits with BSD::Resource to set maximums on CPU and RAM, in case something goes totally out of control. That's just a safety though. chokes JUST a safety, huh? :-) Why is that surprising? We had a dev server get into a tight loop once and use up all the CPU. We fixed that problem, but wanted to be sure that a similar problem couldn't take down a production server. since I never saw a worthwhile resolution to the thread "the edge of chaos," The problem of how to get a web server to still provide some service when it's overwhelmed by traffic is pretty universal. It's not exactly a mod_perl problem. Ultimately you can't fit 10 pounds of traffic in a 5 pound web server, so you have to improve performance or deny service to some users. In a VERY busy mod_perl environment (and I'm taking 12.1M hits/mo right now), which has the potential to melt VERY badly if something hiccups (like, the DB gets locked into a transaction that holds up all MaxClient httpd processes, and YES it's happened more than once in the last couple of weeks), What specific modules/checks/balances would you install into your webserver to prevent such a melt from killing a box? The things I already mentioned prevent the box from running out of memory. Your web service can still become unresponsive if it depends on a shared resource and that resource becomes unavailable (database down, etc.). You can put timers on your calls to those resources so that mod_perl will continue if they're hung, but it's still useless if you've got to have the database. If there's a particularly flaky resource that is only used in part of your application, you could segregate that on it's own mod_perl server so that it won't bring anything else down with it, but the usefulness of this approach depends a lot on the situation. - Perrin