Re: Document Caching
Cahill, Earl wrote: I am finishing up a sort of alpha version of Data::Fallback (my own name) which should work very well for cache'ing just about anything locally on a box. We are planning on using it to cache dynamically generated html templates and images. You would ask a local perl daemon (using Net::Server) for the info and it would look first in the cache. If it isn't in the cache, it falls back according to where you told it to look (for now conffile or DBI, but later Storable, dbm, HTTP hit, whatever), and caches how you tell it to, based on ttl if you like. Hmmm... isn't that sort of backwards? It sounds like you're considering the problem as building a cache that can be taught how to fetch data, but to me it seems more natural to build components for fetching data and teach them how to cache. The semantic for describing how something can be cached are much simpler than those describing how something can be fetched. I would think it makes more sense to do something along the lines of the Memoize module, i.e. make it easy to add caching to your existing data fetching modules (hopefully using a standard interface like Cache::Cache). - Perrin
Re: Fwd: Re: Problem installing Apache::Request
Corey Holzer wrote: 1 Redownloaded the source tar ball for the version of Apache that I am running on my Linux RH 72 box 2 untar'ed the source tar ball for apache 3 Executed /configure --with-apache-includes=the /src/includes directory under the source dir for apache Between 2 and 3 you might have to actually build apache, or at least run its configuration scipt It probably needs to set up OS-dependent files - Perrin
Re: Multiple Location directives question
Geoffrey Young wrote: John Siracusa wrote: I have something like: Location /foo SetHandler perl-script PerlHandler My::Foo /Location Location / SetHandler perl-script PerlHandler My::Bar AuthName Bar AuthType Basic PerlAuthenHandler My::Auth::Bar PerlAuthzHandler My::Authz::Bar require valid-user /Location What I want is for My::Foo to handle all URLs that start with /foo, without any authentication of any kind Then I want the remaining URLs to be handled by My::Bar using its authentication handlers Seems like it should work to me this may be one of those cases where / is handled as a special case Or maybe it's because there is no actual document there? Maybe installing a transhandler would help PerlTransHandler Apache::OK But this is just a stab in the dark really - Perrin
Re: Breaks in mod_perl, works in Perl
Mark Hazen wrote: I'm sorry I didn't explain an important component Since I am dealing with a few hundred requests per minute (this was got me onto mod_perl to begin with), then using DBI's ability to write to a file would vastly overwhelm my system Won't capturing that much data in RAM instantly send your system into swap? Anyway, you can probably get this to work if you can ask DBI to send to a filehandle and then use your magic IO::Capture on that filehandle You just can't use STDERR because it's already magic By the way, at one point we used this DBI trace stuff at eToys It was fairly light on a fast file system like ext2fs The trick to making it really light is to fix it so that only one child process per machine had tracing turned on, which you can do with a little fussing with a pid file and a ChildInitHandler and ChildExitHandler If you just need to see some trace output, you can use this technique On the other hand, your debugging may require seeing trace from every active process in which case this won't help - Perrin
Re: here is a good modperl question on perlmonk
Medi Montaseri wrote: Caller can also buy some content management software like Interwoven's TeamSite product that provides a virtual workarea, for about $300,000 It's so easy and effective to run mod_perl on developers' personal machines, I think there's no excuse not to do it At eToys we also set up a special server for HTML template coders to work on It had a virtual host for each coder, and each of them used their own docroot which they synched with a shared CVS repository using WinCVS They accessed files over a Samba share, so it was seamless for them This was pretty effective, and provided almost exactly the same thing that Interwoven sells Interwoven does add some workflow tools, but most people I've talked to don't seem to use them Maybe if they did get used that would provide more value - Perrin
Re: ANNOUNCE: Apache::Watchdog::RunAway v0.3
Cahill, Earl wrote: Any chance of being able to define a runaway script based on percent of CPU or percent of memory used as well as time in seconds? This would be great for us Every so often we get a script that just starts choking on memory, and gets every process on the box swapping, which kills our load You should use Apache::SizeLimit and Apache::Resource to handle this - Perrin
Re: Blank pages
John E Leon Guerrero wrote: in my case, we had a number of scripts that would change STDOUT in some fashion (usually so they could set $|) but then die due to some error before resetting STDOUT back Interesting One safety measure to prevent this would be to install a cleanup handler that resets STDOUT This is a similar concept to the rollback that Apache::DBI issues in a cleanup handler - Perrin
Re: How to do connection pooling
A.C.Sekhar wrote: How can I maintain the connections in perl? Which connections? Connections to a database? A web browser? Something else? - Perrin
Re: Making perl handlers handle non-Perl
Andy Lester wrote: I want my MyFilter to process EVERYTHING that Apache spits out, whether with mod_perl, mod_php or just reading a .html file from the filesystem, especially the mod_php stuff. Assuming you mean you want to look at the generated content from non-mod_perl handlers and do something with it, apache doesn't work that way. Apache 2.0 does, but that won't help you right now. You might try using a proxy server setup to do this instead. - Perrin
Re: Making perl handlers handle non-Perl
Andy Lester wrote: So, my HTML::Lint checking is only going to work on output from the mod_perl chain. If you aren't terribly concerned about performance, there are several Apache::Proxy modules which should be easy to modify to put your lint checking in. Do a search for proxy on CPAN to see what's out there. - Perrin
Re: Making perl handlers handle non-Perl
Nico Erfurth wrote: your handler could tie the output-handle (is this possible?) and run a subrequest. Nope, not possible. You can only do that for mod_perl requests. - Perrin
Re: Calling an Apache::ASP page from an Apache::Registry script
Andrew Ho wrote: I've been investigating other template systems to try to find similar functionality in an existing package for a non-Tellme related project and haven't been able to find any embedded-Perl solutions that can be called from a .pl and still have the benefits of template caching. Apache::ASP doesn't seem like the best fit for this, if you really don't want to use pages as controllers. You can use Text::Template (just keep a cache of the templates in a global hash) or Template Toolkit for this (yes, it does allow in-line Perl). It also may be possible to do this with Mason. - Perrin
Re: how to pass data in internal redirects?
F. Xavier Noria wrote: For example, in the hangman game in O'Reilly's book a controller would load a session from the cookie, process user's guest, modify the state and redirect the request internally to the view. It would probably be orders of magnitude faster to just call a template engine from your controller to render the view. - Perrin
Re: Can't retrieve form params using post methods, $r-notes and Apache::Request
Mat wrote: Hi all, I have the following configuration. Location /my SetHandler perl-script PerlAccessHandler MyCheck PerlHandler MyHomePage /Location The PerlAccessHandler checks if the user cookie is valid and set a $r-notes() entry to pass the user id to the MyHomePage handler which do his work. In the MyHomePage hanlder i'm using Apache::Request (my $apr = new Apache::Request($r); ) to get the parameters of the form from which i'm calling the handler. If i put the form method in POST, I can't get any parameters, the form is reloaded, but if i put the method to GET then everything is fine. If I use the following configuration Location /my SetHandler perl-script PerlHandler MyHomePage /Location and use either the POST or GET for the form everything is fine. If I put a PerlAccessHandler not using the $r-notes() method then everything is fine. So my problem must come from the $r-notes(). Apparently it messes up with the POST parameters. That seems pretty unlikely. Are you sure you're not trying to read the POST content twice? Maybe you're using a module that reads it? - Perrin
Re: [OT-ish] Session refresh philosophy
When I used CGI::SecureState it gave the client a non-versioning (more on that later) key and stored the state information in the filesystem. Okay, I only looked at it briefly and thought it stored the data on the client. Your module is actually more like CGI::EncryptForm I think, but yours may make things a bit more transparent. Maybe you should polish it up for CPAN. I'm well aware of the page-state vs. browser-state problem. I was recently bitten by it again when some consultants built a web app for my company that puts the search results in a session keyed on a cookie. As soon as the user opens two windows, it's absolute mayhem. - Perrin
Re: Session refresh philosophy
As I understand it, the session data is state which is committed to the database on each request (possibly). It would seem to me that instead of denormalizing the state into a separate session table, you should just store it in a normal table. The typical breakdown I use for this is to put simple state information that connects this browser to long-term data in the session, and everything else in normal database tables. So, I put the user's ID (if this session belongs to an identified user), a flag telling whether or not this user has given a secure login so far in this session, and not much else in the session. Actually, even this stuff could be put into a normalized sessions table rather than serialized to a blob with Storable. It just means more work if you ever change what's stored in the session. - Perrin
Re: [OT-ish] Session refresh philosophy
And that is what I am doing for a small project I'm working on now. In my case, I'm not sure about the capabilities of the remote server, and I know for sure that I don't have a database available, so session information is saved via hidden form fields. It's primitive, but was actually a bit of a challenge to make sure a (unused) hidden field and a visible form element don't appear in the same form. Not my first choice, but it definitely works. Incidentally, this is mostly the same thing as what Jeffrey Baker mentioned a few days ago about storing state entirely inside a cookie with a message digest. The only difference is that by sticking it in a form element you're attaching it to a specific page. - Perrin
Re: [OT-ish] Session refresh philosophy
I built and use a module that encodes a session hash into a number of hidden fields with a security MD5 sum. Sounds a lot like CGI::SecureState. Have you ever looked at it? - Perrin
Re: Image Magick Alternatives?
So, is there an alternative - a module that will take an image (gif/jpeg) and generate a thumbnail from it? The GD module seems like a good candidate. There's also the Gimp modules. - Perrin
Re: Mistaken identity problem with cookie
I have a mysterious mistaken identity problem that I have not been able to solve. There are two common sources of this problem. One is an ID generation system that is not unique enough. Another is a bug in your code with globals (see the section of the Guide about debugging with httpd -X). You could be having problems with a proxy on their end, but most proxies are smart about this stuff. - Perrin
Re: [BUG] Memory Courruption (was: RE: [Q] SIGSEGV After fork())
The only other way I can think of to solve this is to send my module list to this audience. Please find it, attached, with home-grown modules deleted. Have you tried debugging the old-fashioned way, i.e. remove things until it works? That's your best bet. I suspect you will find that you have some module doing something with XS or sockets or filehandles that can't deal with being forked. - Perrin
Re: Mistaken identity problem with cookie
2. I don't think it's a global vairable issue. Basically, I just grab the cookie by $r-header_in('Cookie') and decrypt it. It's what you do after that that matters. Besides, if it's global then the mistaken ID's should be from anywhere randomly. True, but random may not always look random. There is this nagging fact that the parties involved are from the same ISP's i.e. user A1 and A2 are from foo.com, user B1 and B2 are from bar.com, etc. You aren't using IP or domain as part of your ID generation, are you? That would be bad. - Perrin
Re: Cookie as session store
When the cookie is recovered, I simply decode, uncompress, thaw, check the digest, and thaw the inner object. It's really a good idea to do this even when the cookie is nothing but a session ID. A standard module for this like the one Jay mentioned would definitely be nice. My strategy for document generation is to build a DOM tree and then create the output by serializing the DOM to XML or HTML. So, it is natural in this application to just set everything up before sending the response. Since I usually structure my applications to do all the work and then pass some data to a template, they also follow this order. The main problem I see with sending some HTML before the work is complete is that if something goes wrong later on you have no way to send a nice error page out. I sometimes see people having this problem on sites I visit: I get an OK response and some HTML and then a second set of headers with an error code and it looks like garbage in a browser. - Perrin
Re: Cookie as session store
I dunno... That sounds lie a LOT of overhead for just a session ID that's gonna result in server lookups too... It's really not. It adds a negligeble amount of time to the request. As Jeffrey pointed out, the functions he's using are all in C and very fast. Why verify session IDs? To make it hard to hijack sessions. This way it isn't enough to just guess someone else's session ID: you also have to know how to generate the proper digest for it. This is also useful to prevent people from screwing up your stats with bogus IDs. Many people log the session ID for use in calculating people's path through the site and similar things. Often this is done for pages that don't actually retrieve the session data from the backend store. Being able to verify that you have a valid session without hitting your data store can be very useful. - Perrin
Re: mod_perl + UNIVERSAL
I think the problem here is that mod_perl sets the assbackward flag when setting headers via send_cgi_header() (which CGI.pm does). Is this only an issue when using CGI.pm or PerlSendHeader then? I seem to recall having no trouble doing this from a normal handler. - Perrin
Re: mod_perl + UNIVERSAL
However both applications make use of the UNIERVSAL package to create universally accessible methods (to return the current database handle for example) within the application. Better to put those into a package of your own and call them with fully-qualified names, or import them as Tatsuhiko demonstrated. The thing is I am getting some weird behaviour where one application seems to be getting code from the other. In theory this isn't possible with the separated namespaces. I suspect my UNIERVSAL use is the problem. There is just one Perl interpreter per process, and thus one namespace and one UNIVERSAL package. If you try to create two different versions of the sub UNIVERSAL::foo() it won't work: there can be only one. This is true for any package name, actually. If you need separate subs, name them differently or put them in separate packagaes. - Perrin
Re: mod_perl + UNIVERSAL
A list of things I've noticed: * If you have two *different* modules which have the same name, then either one, or the other is loaded in memory, never both. This is dead annoying. I think Perl standard modules + CPAN modules should be shared, other modules which are specific to a given script should not. This is how perl works. You are not allowed to have two different modules with the same name loaded in the same interpreter. If you can't deal with that, maybe you should consider using an environment like Mason or Embperl which allow a page-based approach closer to PHP, rather than using perl's package namespace. It is also true that mod_perl 2 will have additional support for doing fancy things with virtual hosts, like having separate pools of interpreters (and thus separate namespaces) for each virtual host. See http://perl.apache.org/~dougm/modperl_2.0.html for more. I am not the only developer on the planet. For instance there is a CPAN module called HTML::Tree. But there is also another module on the web called HTML_Tree, which installs itself as HTML::Tree. One person's mistake hardly justifies a massive change in the way namespaces work in perl. Anyway, that was fixed, thanks to Terrence Brannon. His HTML::Seamstress module replaces the former HTML_Tree. * Global variables should be reinitialized on each request. Or at least if we want them to be persistent we do not want them to be shared with different scripts on different virtual hosts! Global variables are variables without scope. They are not cleaned up by definition. If you want variables that go out of scope, use lexicals. If you have legacy code that depends on mod_cgi behavior to work, use Apache::PerlRun which clears globals on each request. * Perl garbage collector should be smarter (okay, that may not be a mod_perl issue). C geeks out there, ain't it possible to compile a version of Perl with a better GC? Doug has talked about doing something with this in mod_perl 2 to help clean up memory taken for lexicals, but it's not definite. And yes, this is really a Perl issue, not a mod_perl one. - Perrin
Re: mod_perl + UNIVERSAL
If the UNIVERSAL namespace is shared I would have thought one or the other (the last one?) would get the print_error sub and the other loses out but at some point they seem to coexist just fine. Whilst at some other point they as expected and one gets the others. Any theories? You have a bunch of different processes running. Some of them hit the forums first, and some hit the portal first. Last one wins. - Perrin
Re: Weird mod_perl CGI.pm interaction (Bug?)
Keep in mind I tried several version of CGI.pm. Where the problem is (and yes, I did hack CGI.pm and fixed it but felt it was unnessary to hack CGI.pm since it wasn't at fault and didn't want to break other working apps), e, the problem is in the read_from_client() call where CGI.pm issues a read() from the STDIN file handle. The problem is when it's called for the second time the handle reference is missing. I don't think this is the same problem. Mike is actually modifying the request (by making a subrequest) before CGI.pm clears its globals (in a cleanup handler) and wanting CGI.pm to notice. It isn't that he's getting nothing from CGI.pm; it's that he's getting the same thing both times. At least that's how I interpreted it. - Perrin
Re: Speed of downloading problem.
Here is the part of the httpd.conf that I believe you wanted to see. Hmmm... I don't see anything wrong with this. It seems like the problem is simply that Apache 1.3.x is not as fast as IIS at sending static files on NT. Not too surprising. I've been told that Apache 2 is significantly better about this, but that won't help you right now. If this is a big problem for your application, my advice would be to either use a proxying system so that IIS (or something else) can send the static files and mod_perl can handle the dynamic stuff, or look at mod_perl alternatives for Win32 like PerlEx and FastCGI. - Perrin
Re: Speed of downloading problem.
I have Apache/mod_perl installed on a NT box, and I am allowing customers to do downloads of High-Resolution assets. My problem is the speed of downloads is about 1/3 slower than the same box running IIS. Can you post your httpd.conf? Or at least the parts of it about threads and processes? It is possible that Apache is just not that fast on NT. NT support is experimental in the 1.3 series. One thought here was to go to 2.0 You can't run mod_perl 1.x on Apache 2.0. Another thing you could try is having multiple servers. One could handle static requests and proxy the dynamic ones to mod_perl. I don't know if IIS knows how to do this or not, but there's probably something available for NT that does it. - Perrin
Re: AuthSession Manager [was] Apache::AuthCookie not set cookie really
Application's main goals 1. Simple install. I don't want use cron jobs for cleanup - I think, it can be problem for some users. Most of the existing session stuff is written to leave cleanup to you. If you don't want to use cron jobs, you can do it in a cleanup handler, possibly exec'ing a separate script to avoid keeping the heavy mod_perl process around. I need to authorize user and don't want to query on every request is you admin, which departments you belong to etc.. Unless you're willing to put real information in the cookie (not just an ID), you have to do some kind of lookup on the server-side for every request if they need session information. It may not be to a database though. If you know that each user will stay on a single server, you can use a disk-based technique like Cache::FileCache or Apache::Session::File. Apache::AuthCookie doesn't want to set cookie on redirect (see above). There's a lot of stuff in the archives about cookies and redirects. Maybe that will help. You're not the first person to have problems with this. I don't think that it is good to use the Oracle database for maintaining state or secrets for tickets. It can be slower than query indexed table even on every request for password and departments where user works. It's generally fast enough, since it's a single row retrieved by ID. MySQL is very fast at this kind of thing though. - Perrin
Re: Cache::FileCache issues
[Mon Jan 28 14:52:35 2002] [error] mkdir : No such file or directory at /opt/gnu /depot/perl-5.6.1/lib/site_perl/5.6.1/Cache/FileBackend.pm line 220 Looks to me like your system has no mkdir command, or it isn't in the path, or it doesn't support an option that's needed (-p maybe?). Maybe Cache::FileBackend should use File::Path::mkpath for portability. - Perrin
Re: performance coding project? (was: Re: When to cache)
It all depends on what kind of application do you have. If you code is CPU-bound these seemingly insignificant optimizations can have a very significant influence on the overall service performance. Do such beasts really exist? I mean, I guess they must, but I've never seen a mod_perl application that was CPU-bound. They always seem to be constrained by database speed and memory. On the other hand how often do you get a chance to profile your code and see how to improve its speed in the real world. Managers never plan for debugging period, not talking about optimizations periods. If you plan a good architecture that avoids the truly slow stuff (disk/network access) as much as possible, your application is usually fast enough without spending much time on optimization (except maybe some database tuning). At my last couple of jobs we actually did have load testing and optimization as part of the development plan, but that's because we knew we'd be getting pretty high levels of traffic. Most people don't need to tune very much if they have a good architecture, and it's enough for them to fix problems as they become visible. Back to your idea: you're obviously interested in the low-level optimization stuff, so of course you should go ahead with it. I don't think it needs to be a separate project, but improvements to the performance section of the guide are always a good idea. I know that I have taken all of the DBI performance tips to heart and found them very useful. I'm more interested in writing about higher level performance issues (efficient shared data, config tuning, caching), so I'll continue to work on those things. I'm submitting a proposal for a talk on data sharing techniques at this year's Perl Conference, so hopefully I can contribute that to the guide after I finish it. - Perrin
Re: performance coding project? (was: Re: When to cache)
The point is that I want to develop a coding style which tries hard to do early premature optimizations. We've talked about this kind of thing before. My opinion is still the same as it was: low-level speed optimization before you have a working system is a waste of your time. It's much better to build your system, profile it, and fix the bottlenecks. The most effective changes are almost never simple coding changes like the one you showed, but rather large things like using qmail-inject instead of SMTP, caching a slow database query or method call, or changing your architecture to reduce the number of network accesses or inter-process communications. The exception to this rule is that I do advocate thinking about memory usage from the beginning. There are no good tools for profiling memory used by Perl, so you can't easily find the offenders later on. Being careful about passing references, slurping files, etc. pays off in better scalability later. - Perrin
Re: UI Regression Testing
There are many web testers out there. To put it bluntly, they don't let you write maintainable test suites. The key to maintainability is being able to define your own domain specific language. Have you tried webchat? You can find webchatpp on CPAN.
Re: UI Regression Testing
Gunther Birznieks writes: the database to perform a test suite, this can get time consuming and entails a lot of infrastructural overhead. We haven't found this to be the case. All our database operations are programmed. We install the database software with an RPM, run a program to build the database, and program all schema upgrades. We've had 194 schema upgrades in about two years. But what about the actual data? In order to test my $product-name() method, I need to know what the product name is in the database. That's the hard part: writing the big test data script to run every time you want to run a test (and probably losing whatever data you had in that database at the time). This has been by far the biggest obstacle for me in testing, and from Gunther's post it sounds like I'm not alone. If you have any ideas about how to make this less painful, I'd be eager to hear them. - Perrin
Re: When to cache
I'm interested to know what the opinions are of those on this list with regards to caching objects during database write operations. I've encountered different views and I'm not really sure what the best approach is. I described some of my views on this in the article on the eToys design, which is archived at perl.com. Take a typical caching scenario: Data/objects are locally stored upon loading from a database to improve performance for subsequent requests. But when those objects change, what's the best method for refreshing the cache? There are two possible approaches (maybe more?): 1) The old cache entry is overwritten with the new. 2) The old cache entry is expired, thus forcing a database hit (and subsequent cache load) on the next request. The first approach would tend to yield better performance. However there's no guarantee the data will ever be read. The cache could end up with a large amount of data that's never referenced. The second approach would probably allow for a smaller cache by ensuring that data is only cached on reads. There are actually thousands of variations on caching. In this case you seem to be asking about one specific aspect: what to cache. Another important question is how to ensure cache consistency. The approach you choose depends on frequency of updates, single server vs. cluster, etc. There's a simple answer for what to cache: as much as you can, until you hit some kind of limit or performance is good enough. Sooner or later you will hit the point where the tradeoff in storage or in time spent ensuring cache consistency will force you to limit your cache. People usually use something like a dbm or Cache::Cache to implement mod_perl caches, since then you get to share the cache between processes. Storing the cache on disk means your storage is nearly unlimited, so we'll ignore that aspect for now. There's a lot of academic research about deciding what to cache in web proxy servers based on a limited amount of space which you can look at if you have space limitations. Lots of stuff on LRU, LFU, and other popular cache expiration algorithms. The limit you are more likely to hit is that it will start to take too long to populate the cache with everything. Here's an example from eToys: We used to generate most of the site as static files by grinding through all the products in the database and running the data through a templating system. This is a form of caching, and it gave great performance. One day we had to add a large number of products that more than doubled the size of our database. The time to generate all of them became prohibitive in that our content editors wanted updates to happen within a certain number of hours but it was taking longer than that number of hours to generate all the static files. To fix this, we moved to not generating anything until it was requested. We would fetch the data the first time it was asked for, and then cache it for future requests. (I think this corresponds to your option 2.) Of course then you have to decide on a cache consistency approach for keeping that data fresh. We used a simple TTL approach because it was fast and easy to implement (good enough). This is just scratching the surface of caching. If you want to learn more, I would suggest some introductory reading. You can find lots of general ideas about caching by searching Google for things like cache consistency. There are also a couple of good articles on the subject that I've read recently. Randal has an article that shows an implementation of what I usually call lazy reloading: http://www.linux-mag.com/2001-01/perl_01.html There's one about cache consistency on O'Reilly's onjava.com, but all the examples are in Java: http://www.onjava.com/pub/a/onjava/2002/01/09/dataexp1.html Also, in reference to Rob Nagler's post, it's obviously better to be in a position where you don't need to cache to improve performance. Caching adds a lot of complexity and causes problems that are hard to explain to non-technical people. However, for many of us caching is a necessity for decent performance. - Perrin
Re: When to cache
Perrin Harkins writes: To fix this, we moved to not generating anything until it was requested. We would fetch the data the first time it was asked for, and then cache it for future requests. (I think this corresponds to your option 2.) Of course then you have to decide on a cache consistency approach for keeping that data fresh. We used a simple TTL approach because it was fast and easy to implement (good enough). I'd be curious to know the cache hit stats. In this case, there was a high locality of access, so we got about a 99% hit rate. Obviously not every cache will be this successful. BTW, this case seems to be an example of immutable data, which is definitely worth caching if performance dictates. It wasn't immutable, but it was data that we could allow to be out of sync for a certain amount of time that was dictated by the business requirements. When you dig into it, most sites have a lot of data that can be out of sync for some period. I agree with latter clause, but take issue with the former. Typical sites get a few hits a second at peak times. If a site isn't returning typical pages in under a second using mod_perl, it probably has some type of basic problem imo. Some sites have complex requirements. eToys may have been an anomaly because of the amount of traffic, but the thing that forced us to cache was database performance. Tuning the perl stuff was not very hard, and it was all pretty fast to begin with. Tuning the database access hit a wall when our DBAs had gone over the queries, indexes had been adjusted, and some things were still slow. The nature of the site design (lots of related data on a single page) required many database calls and some of them were fairly heavy SQL. Some people would say to denormalize the database at that point, but that's really just another form of caching. Use a profiler on the actual code. Agreed. Add performance stats in your code. For example, we encapsulate all DBI accesses and accumulate the time spent in DBI on any request. No need to do that yourself. Just use DBIx::Profile to find the hairy queries. Adding a cache is piling more code onto a solution. It sometimes is like adding lots of salt to bad cooking. You do it when you have to, but you end up paying for it later. It may seem like the wrong direction to add code in order to make things go faster, but you have to consider the relative speeds: Perl code is really fast, databases are often slower than we want them to be. Ironically, I am quoted in Philip Greenspun's book on web publishing saying just what you are saying: that databases should be fast enough without middle-tier caching. Sadly, sometimes they just aren't. - Perrin
Re: PerlRun Gotchas?
A site I run uses a fair variety of different programs, the most common of which are run through Apache::Registry. To cut the memory overhead, however, less commonly used programs are run through Apache::PerlRun. I would not expect PerlRun to use less memory than Registry. Both the Registry and PerlRun programs use a common module which defines a few subroutines and a selection of exported variables. These variables are in the module as globals (ie: no my declaration), but with a use vars to get them through strict. Does the module have a package name? Are you exporting the variables from it? Seeing some code would help. 200 OK The server encountered an internal error or misconfiguration... ...More information about this error may be available in the server error log. That just means the error happened after the initial header was sent. The error log indicates every time that this is due to a global set in my module that remains undef for the program that tries to call it (and an open that dies on failure requires the global). Again, some code would help. I suspect you are getting bitten by namespace collisions: http://perl.apache.org/guide/porting.html#Name_collisions_with_Modules_and - Perrin
Re: slow regex [BENCHMARK]
Your system has to be swapping horribly. I bet that the ulimit for whoever apache is running as has the memory segment set super low. That's a possibility. I was also thinking that maybe mod_perl was built against a different version of Perl, possibly one that has a problem with this particular regex which was fixed in a later version. - Perrin
Re: Cross-site Scripting prevention with Apache::TaintRequest
What techniques do you use to insure that your application is not vulnerable? Usually I write application so that they do some processing, package up a chunk of data, and hand it to a template. With this structure, all you need to do is HTML-escape the data structure before handing it off, or use a templating tool that defaults to HTML-escaping all printed variables. If you're doing this, nothing the user sends in will pose a CSS threat. - Perrin
Re: Cross-site Scripting prevention with Apache::TaintRequest
Yes and no. XSS attacks are possible on old browsers, when the charset is not set (something which is often the case with modperl apps) and when the HTML-escaping bit does not match what certain browsers accept as markup. Of course I set the charset, but I didn't know that might not be enough. Does anyone know if Apache::Util::escape_html() and HTML::Entities::encode() are safe? - Perrin
Re: handling eval in ePerl
print STDERR blah blah blah is going to the browser but I am not really worried about it too much unless it is something I should worry about - anyone care to comment on that ? Printing error messages to the public is a potential security risk, so you have to decide how paranoid you want to be. You could change this behavior by modifying the tied STDERR in Parse::ePerl or maybe in Apache::ePerl where it prints the message to the browser. Ofcourse I still dont understand why die was being trapped out there. It is being trapped to allow for error reporting and to avoid leaving the program in an unknown state when a problem occurs. If you want it to still work for this but not pick up your eval() calls, you can replace the __DIE__ handler there with something fancier like this: local $SIG{__DIE__} = sub { my $in_eval = 0; for( my $stack = 1; my $sub = (CORE::caller($stack))[3]; $stack++ ) { $in_eval = 1 if $sub =~ /^\(eval\)/; } $error .= $_[0] unless $in_eval; }; This is a slight variation of some Michael Schwern code that Stas posted a little while ago. - Perrin
Re: Forking another process in Apache?
I have a requirement to spin off a SQL loader process after a web page (a form which is qualified and accepted) has been submitted. Does it make sense, or more importantly, is it dangerous to apply a fork at the end of a module such as this: You're probably better off using a cleanup handler to do this after disconnecting form the client. See the guide for more details: http://perl.apache.org/guide/performance.html#Forking_and_Executing_Subp rocess - Perrin
Re: Cgi permission Questions
Here is the problem: create.pl is owned by test and group test and has file permissions 755. When the create.pl script is run it becomes owner apache and group apache and has to create new files and directories on the machine. All of the new files and directories then become owner apache and group apace. I need them to stay as owner test and group test. There is some information on SuExec in the guide: http://thingy.kcilink.com/modperlguide/install/Is_it_possible_to_run_mod _perl_e.html One possible solution for this with mod_perl is to run a separate server that just handles this script, and start that server as the proper user. - Perrin
Re: slow regex [BENCHMARK]
under mod_perl this takes 23 seconds. running the perl by hand (via extracting this piece into a seperate perl script) on the same data takes less than 1 second. Are you sure that the string you're regex'ing is the same in both cases? Why are you using the /o operator? CG isn't a variable, is it? - Perrin
Re: How to handle die
Umm it didnt really answer my original query but I guess since no one has answered it - either I didnt present it correctly or no one has a answer to it. Or you posted it late on Saturday night on a weekend when most US workers have Monday off and may be travelling. Not everyone is on the same schedule as you, so give it a little time. I probably wont jump into the I want a newbie mailing list fray for this though ;). I don't think it would make a difference. It would be pretty much the same people on either list. There are a couple of people on this list who use ePerl. You might want to repost with ePerl in your subject. However, most of us no longer use it. ePerl is getting old at this point and has a pretty small feature set compared to the more actively maintained alternatives. Here's an attempt to answer your questions: 1. Is die supposed to be handled by ePerl/EmbPerl/Mason ... or did ePerl end up over ridding something. In that case I would rather have it restored to the default. First, ePerl has nothing to do with Embperl or Mason. It is a totally separate program. The Apache::ePerl code is very simple, and I suggest you read it at some point. It attempts to eval() your code, and does the behavior you saw if it fails (which is what happens when your script does a die()). I don't think you can change that without changing the code, but that's pretty easy to do. 2. How do I implement a solution throughout the site without having to do goofy stuff in every HTML page or module. Solution to what? To having die() trapped? Changing the Apache::ePerl code will be a site-wide change, so I'd suggest you do it there. 3. Why would anyone do that in the first place ? Why catch exceptions? Usually to allow the program to try something else, clean up resources, or print a useful error message. - Perrin
Re: handling eval in ePerl
Umm I didnt mean to offend anyone in my previous posting - I did say I probably hadnt presented my situation properly. No problem, I just meant don't give up so quickly. Ofcourse you noticed I wrote ePerl/EmbPerl/Mason ?? I clubbed them together since I assume among other things you can embed perl code in HTML using either of them. You can, but they don't share any code with ePerl. My problem is that die works fine as such but it conks out if done inside a eval. Okay, I missed the part about eval() before. Take a look at this code from Parse::ePerl::Evaluate() : local $SIG{'__DIE__'} = sub { $error .= $_[0]; }; That's going to kill your exception handling code. You need to change that if you want to be able to use eval() in your code. Matt has an explanation of this in the exceptions part of the mod_perl Guide. It feels like being told to change gcc's code if my C code is not working :) - yah both of them are written in C . Apache::ePerl is written in perl. It calls Parse::ePerl to do the dirty work, and some of that is written in C, but not the part that's causing you problems. - Perrin
Re: Apache::Session getting DESTROYed in wrong order
I register a clean up handler to explicitly untie the session variable. I have found that it's safer to put things in pnotes than to use globals and cleanup handlers. We used a lot of cleanup handlers at eToys to clear globals holding various request-specific things, and we started getting unpredictable segfaults. When I moved them to pnotes instead the segfaults went away. I think it may have had something to do with cleanup handlers running in an unpredictable order and some of them trying to use things that were already cleaned up, so it was probably my fault, but pnotes just seems a bit more foolproof. - Perrin
Re: Apache::Session getting DESTROYed in wrong order
In a Mason context, which is where I'm using it, I do this in my top-level autohandler (ignore the main:: subroutines, they're just for pedagogy): %init # 'local' so it's available to lower-level components local *session; my $dbh = ::get_dbh; my $session_id = ::get_cookie('_session_id'); tie %session, 'Apache::Session::MySQL', $session_id, {Handle = $dbh, LockHandle = $dbh}; ... /%init Geez, that's awfully confusing to look at (local and typeglobs is not a newbie-friendly combo). Isn't there a simpler way? What about putting it in pnotes? - Perrin
Re: weird problem. Lost of the POST data
Ummm yes... you know, I'm using the Template Toolkit. Try using the Perl stash instead of the XS stash, and see if your problem goes away. It seems as if the httpd child executes the processing of the template so fast that CGI.pm has no time to get the POST data. I don't think so. It seems to me like your processes are getting corrupted by some C code or an improperly shared file handle or socket. I doubt this has anything to do with CGI.pm, since so many people use it and don't report having the same problem. If you want to experiment, you could replace CGI.pm and see if it improves thing for you. There are plenty of other modules you can use to parse POST data. - Perrin
Re: weird problem. Lost of the POST data
Well all my modules are written in Perl. When you say some C code you mean the C code in DBI, or CGI or Template, don't you? Yes. That's why I suggest trying Template with the Perl stash instead of the XS one. - Perrin
Re: kylix: rad!
GUI builders usually don't work for anything but the most trivial websites that could be written in anything and do fine. consider struts, a popular java mvc framework. it defines simple interfaces for things like actions and forms. does struts (and mvc in general) work for non trivial websites? Struts is a framework, not a GUI builder. I'm all for frameworks, and we have stuff on CPAN that duplicates all the significant parts of Struts. a struts-oriented rad tool could easily scan WEB-INF dirs to find action and form classes and represent them in the gui. the main purpose of the tool would be to assemble and configure those classes in order to generate a struts-config.xml file. it could also incorporate ide functionality. Such a tool does exist for Struts, but all it does is generate/edit the config file. Too me, this doesn't seem very labor-saving (typing in a Swing app vs. typing in my text editor), but it might generate more interest among certain groups. Adding some mod_perl oriented stuff to whatever the leading Apache GUI is these days could be a good start. People seem to come to mod_perl because they need more performance or more control than they can get from CGI. I'm not sure I want to try and draw in users who can't program at all. why do you think this tool would appeal to people who can't program at all? Because your post made it sound like you were talking about drag-and-drop wizard-driven GUI builders with pre-written components (which is what Kylix is trying to be, if I understand it correctly). There is a need for tools to generate instant database editing apps, and some projects to build those tools exist now. Beyond that, I think most users know enough Perl to write actual code in a good editor. There are already commercial Perl IDEs (aimed at CGI mostly) that have some code generation support and a set of pre-built components. Maybe looking at those would help to gauge developer demand for this kind of thing. - Perrin
Re: Request Limiter
It's configurable so after exceeding a threshold the client gets content from the shared memory cache, and if a second threshold is exceeded (ok this guy is getting REALLY irritating) then they get the 'come back later' message. They will only get cached content if they exceed x number of requests within y number of seconds. Nice idea. I usually prefer to just send an ACCESS DENIED if someone is behaving badly, but a cached page might be better for some situations. How do you determine individual users? IP can be a problem with large proxies. At eToys we used the session cookie if available (we could verify that it was not faked by using a message digest) and wold fall back to the IP if there was no cookie. Any ideas on how to write a version of this that one CAN simply drop into an existing application would be most welcome. It's hard to do that without making assumptions about the way to cache the content. Personally, I prefer to make this kind of thing an AccessHandler rather than using Apache::Filter, but your approach makes sense for you method of caching. - Perrin
Re: my $var at file scope and __DATA__ sections under mod_perl
Each time, the warn is for 'blah' because the value 'test' is never retained in $var. Is this intended behaviour? No, that should create a closure that keeps the value of $var. Are you sure these requests are all going to the same instance? Weird, it's like the MIME::Types::DATA handle just mysteriously ran out of data halfway through reading from it. Does anybody have any idea what's going on here. No, but it doesn't obviously point to problems with closures and lexical scoping in my opinion. It looks more like you have a problem with that filehandle. - Perrin
Re: Unsetting standard response headers?
I have noticed that Yahoo uses Location: header only for redirect responses and thought it may be good to save half of the bandwidth and do the same, as my particular script/server is serving redirects mostly. So my question is how to unset Date:, Server: and Content-Type: response headers? Who is setting them in the first place? If they are generated by your script and you don't set them, Apache will not add them. You may be seeing them added for redirects that Apache does for you, like sending http://yoursite to http://yoursite/. You can handle those yourself instead if you want to. - Perrin
Re: mod_perl framework + code reuse question
For file organization, I'm thinking of making all page modules start with a common namespace substring (e.g. Projectname::Page) to distinguish them from the support (model) modules I like to name the top level modules SiteName::Control::* and the model modules SiteName::Model::*. Calling the modules Page makes it sound like each one corresponds to a single page, which is not always true, i.e. you might have an adress book module that generates many different pages (with different templates) based on the parameters passed to it. Glad to hear the handlers are working out for you. - Perrin
Re: formmail spammers
I assume I'm not the only one seeing a rash of formmail spam lately. Is THAT what it is? I have a Yahoo mail account which someone has been sending literally thousands of messages per day to, CC'ing lots of people on every one, and they all appear to be from some kind of compromised form mailer script. I'm open to any suggestions. - Perrin
Re: Configuration loading twice - how to cope?
hrm. the problem might not be the double-loading of httpd.conf then - that's been around since, well, before most of us (I tracked that down to apache 0.9 once through list archives) more likely is this: http://marc.theaimsgroup.com/?l=apache-modperlm=100510779912574w=2 and the other reports in the archives that describe the same thing. And my suggestion for dealing with that in the short term is to change your PerlModule directives to use Module; inside startup.pl or Perl sections. - Perrin
Re: mod_perl framework + code reuse question
What are the basic advantages, disadvantages, and limitations of: (a) stuffing all this setup/framework code into a module (either a new module or subclassing Apache::RegistryNG as you mention below), versus, (b) stuffing it into a handler that all requests for a large subset of the pages on this site have to go? Subclassing RegistryNG is pretty much identical to making your own handler. In both cases, you have some common code that runs on every request and then it decides what other code should be run to handle this specific action. In my opinion it's clearer this way than if you make a module with the init code and call it explicitly, but I can't think of a very good technical argument for it. The handler/RegistryNG approach lets you do neat things with subclassing, like create a special subclass that does some additional setup for a certain group of scripts. That wouldn't be as clean if you use the init module approach, since you'd have different scripts calling different init modules. Other than the speedup from reduced overhead, what are the primary advantages to using handlers rather than Apache::Registry for content handlers? You can actually use subroutines without fear! Also, reducing the amount of magic (i.e. all of that package generation and eval stuff that Registry does) can help clear up confusion. And you can use __END__ and __DATA__. The big change in moving from Registry (or RegistryNG) to a handler is that you have to move your code from the main part of the script into a subroutine and turn the script into a proper module. There's good stuff on this in the guide. The primary disadvantage seems to be that I have to restart httpd an awful lot, but maybe Apache::Reload can help here. Can it be used to reload modules that implement handlers? Yes, it should cover your needs. This is a very helpful article. I have read it several times and still keep coming back to it. I would also like to learn more about the model-view-controller approach in general. A Google search will give you tons to read, but most of it refers to Java. It's all applicable to mod_perl though. - Perrin
Re: testing server response time
After I set up my app (webtool.cgi) and created the single script version (bigtool.cgi), I ran this script on my machine and it showed that the single file was about 10-15% faster than the multiple modules. No offense, but your script must not have been doing much in this test. The difference between putting everything in one script vs. using modules is just the time it takes to open and read the files. It's a tiny difference in most cases. Are you sure you were compiling all of the same code in both cases? My first question is, is the above script a valid test of CGI response time? It's not useless, but LWP isn't the fastest and it's a lot easier to just use ab, or one of the other tools suggested in the guide. So, for example, should the results reflect any improvements with mod_perl enabled? Yes. Because what I found is that the response time differed less than 5% between mod_perl-enabled and mod_perl disabled configurations. There could be a problem in your config. How about posting the part you changed to enable mod_perl? From other postings, it seems like Windows mod_perl works great, and I should see a significant speed-up. Not as much as on unix, but definitely a speed-up. - Perrin
Re: testing server response time
I was also thinking it would only make a small difference, but I see many perl/CGI scripts that boast 'all this functionality in a single script' They probably don't know any better, but to me that translates to giant bloated unorganized mess of a script. # BEGIN MOD_PERL CONFIG #LoadModule perl_module modules/mod_perl.so #ScriptAlias /perl-bin/ c:/IndigoPerl//perl-bin/ #PerlSendHeader On #SetHandler perl-script #Options ExecCGI #/Location # END MOD_PERL CONFIG That won't do it. Check the docs for Apache::Registry. That ScriptAlias should be removed for mod_perl. You want something more like this: Alias /perl-bin/ c:/IndigoPerl//perl-bin/ PerlModule Apache::Registry Location /perl-bin SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI /Location Not sure if you need PerlSendHeader or not. It depends on your code. - Perrin
Re: BSD::Resource und apache/mod_perl
has anybody any ideas? Apache::Resource.
Re: BSD::Resource und apache/mod_perl
PerlModule Apache::Resource PerlSetEnv PERL_RLIMIT_AS 32:64 PerlChildInitHandler Apache::Resource in httpd.conf, but Apache::Resource uses BSD::Resource in the end and thus its the same as use BSD::Resource; setrlimit RLIMIT_AS, 3200, 6400; The difference is that Apache::Resource should apply this limit to each new child process. When you do this from the shell, you are limiting the parent Apache process, which isn't very useful. Are you sure you're using the right units (bytes vs. megabytes)? Could your server be immediately going higher than the limit you set for it? - Perrin
Re: mod_perl framework + code reuse question
There are many *.par pages (estimate: 70-100 when conversion is complete), and they all contain the following code with minor variations that could be made consistent (like what constants are imported, what modules are used, etc.). I'd like to find a way to prevent having that code (below) show up over and over again so I can eliminate a potential maintenance headache, but I'm not sure of what's a good way to bundle it up for reuse. Normal Perl rules apply. Modules are good for sharing code. You could stuff the shared parts into a sub in a module that every script would call. Or you could use an OO approach, with a base class that holds all of the boiler-plate stuff. One idea I had was to write a handler that acts as a sort of minimal application framework that contains the code below and determines what perl module should be required and executed based on $apache-path_info and $apache-uri. That's a good way to go too. Moving from Apache::Registry to handlers can be empowering. This sounds like a substantial effort Only if your code currently depends on the CGI emulation features of Apache::Registry. If it's clean code, you should be able to convert it without much trouble. You could also try subclassing Apache::RegistryNG and adding your setup code to the beginning of each request there. I'd appreciate any input on how other people are structuring similar type applications in mod_perl, where the output is generated by Template Toolkit based on data obtained via SQL queries using parameters received mostly in the URL. I use handlers, with a model-view-controller design. The approach is documented in my Perl.com article about the eToys design. I keep all the common setup stuff in a base class that the other controllers inherit from. - Perrin
Re: mod-perl, modules and initializations
What is the difference between how a BEGIN block and an anonymous block in a module loaded into mod_perl? It looks to me like you are confused about our and BEGIN. If you change the our to a use vars I think it will fix your problems. This is not mod_perl-specific. Are anonymous blocks in a module only read and executed when mod-perl first loads them, ie once? The block in your example is not inside of a subroutine, so it will only be called once when the module is loaded. Another problem is when I try to build a SELECT HTML element with a call to the CGI module. In my anonymous block all of a sudden the HTML form variables are no longer available with the CGI::param call. Yet I can build the select element later in the cgi scripts using the same variables without a problem. I'm guessing it's more scoping problems with our. In a simpler line, should I have a use DBI() in startup.pl as well as the PerlModule Apache::DBI in httpd.conf? You need to use both Apache::DBI and DBI somewhere. Either place is fine. I usually pull in lots of modules, so it's easier to do in startup.pl. - Perrin
Re: mod-perl, modules and initializations
By load stage I mean BEGIN blocks, anonymous subroutines in packages loaded at startup, or even named subroutines called from startup.pl All of those things happen during server startup, before any request has been submitted. There is no form data at that time. Maybe if you could explain what you're trying to accomplish by calling CGI methods during initialization, someone could suggest an alternative way to do it. - Perrin
Re: mod-perl, modules and initializations
On Tuesday 08 January 2002 08:16 pm, Dave Morgan wrote: I'm trying to populate select boxes(or other input types)for my HTML pages. An example would be a drop down list of states and/or provinces. A large number of these are populated from lookup tables in the dba and are relatively static. Okay, I suspect the problem is that whenever you get a new request the setup you did for CGI.pm gets cleared. You should store the static data in a global, and then populate the CGI widget with it on every request. - Perrin
Re: Sticky Pages.
Ok, now i'm totally confused. Have you read the documentation for Apache::PerlRun? That might help. Try a perldoc Apache::PerlRun. 1. I have the following (and ONLY the following related to modperl) in my httpd.conf file (of course there are other regular apache directives too): LoadModule perl_module modules/mod_perl.so AddModule mod_perl.c In a subdirectory there's an .htaccess file containing this reference to modperl: Files * SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI PerlSendHeader On /Files You are telling Apache what module to use for these files, but you also have to tell it to load the module. Put in PerlModule Apache::PerlRun before your Files section as directed by Stas and the PerlRun documentation. - Perrin
Re: Fast template system. Ideas,theorys and tools
I looked at just about every template system on CPAN and came across text::template. Anyone use this one? I'd suggest you read my overview of templating options. It summarizes the top choices for templating tools, and talks about the strengths of weaknesses of Text::Template.. http://perl.apache.org/features/tmpl-cmp.html I implamented some google style timing in the API. It's basicly gets a Time::HiRes timestamp in the beginning and does the math at the very end and posts it in an html comment. You'd be better off with Devel::DProf (or Apache::DProg under mod_perl). My average transaction time is about .08 of a second. That leads me to think that my machine can handle 10 html page generations a second (this isn't an exact science, but close). You are assuming serial execution. You should be able to push much more than that through in a second because of parallel execution. 3rd. my sql queries are not the most optimized and mostly tossed together. DBIx::Profile can help you identify problems in your queries. And follow the optimization advice for DBI in the guide. - Perrin
Re: Suggestions on an XML-RPC Service using modperl?
Even then, I'd avoid disk-based cache systems, instead preferring Cache::* if it must be shared, or just global variables if it doesn't need to be. Cache::FileCache is disk-based, and it is the fastest of the Cache:: options for most data sets. There was a thread a little while back about data sharing that showed the top performers to be Cache::Mmap and IPC::MM. Cache::Cache and MLDBM::Sync should be more than fast enough for all but the most highly optimized systems. - Perrin
Re: Suggestions on an XML-RPC Service using modperl?
As far as the cacheing goes, we have had extremely good luck with IPC::ShareLite used to share info across mod_perl processes. IPC::ShareLite is not as fast as some of the other options, especially when dealing with a large data set. The disk-based options tend to be faster. - Perrin
Re: WYSIWYG Template Editor
Does anybody know a template engine, whose templates can be edited with a WYSIWYG editor (favourably dreamweaver) as they will look when filled with example data? HTML_Tree: http://homepage.mac.com/pauljlucas/software/html_tree/
Re: Fast template system. Ideas,theorys and tools
What do you suggest as a good benchmark tool to use that would be 'smart' when testing a whole complete site. For pounding a bunch of URLs, the best are ab, httperf, and http_load. If you need something fancier that tests a complex series of actions and responses, there are several packages on CPAN. They've been discussed on this list, but I haven't tried any of them myself so I can't comment on them. They will not scale as easilly as the first three I mentioned though, so if you need to test hundreds of requests per second with them you will need multiple machines to run the clients on. - Perrin
Re: Apache::Session getting DESTROYed in wrong order
The circular reference was the only way I could think of to force an object to be destroyed during global destruction. What happens if you use a global? Hmm, that may be - Mason does create more closures now than it used to. It seems like only 'named' closures would create this problem, though, and not 'anonymous' closures (since the refcount of the anonymous closure itself should go to zero, freeing its contents). I was thinking of this situation: my %session = get_session(); sub transmogrify { $session{'foo'}++; } I could be wrong, but I think that will make %session stick around, because transmogrify() now has a private copy of it. - Perrin
Re: Strange Apache 2.0 rewrite/proxy issue
The 2.0.28 proxy uses mod_rewrite. When it rewrites url's internally to go to a static apache server all works great! Compare the headers sent by your static pages vs. the ones sent by your mod_perl pages. There might be something not quite 1.1 compliant about it that ticks of apache 2 (although segfaulting is clearly not reasonable behavior even so). - Perrin
Re: What phase am I in?
I have the book but I don't always have it with me. That chapter is actually available for free on-line at http://www.modperl.com/. - Perrin
Re: Tips tricks needed :)
Like this? (using register_cleanup instead of pnotes) Better to use pnotes. I started out doing this kind of thing with register_cleanup and had problems like random segfaults. I think it was because other cleanup handlers sometimes needed access to these resources. - Perrin
Re: Tips tricks needed :)
By the way, is there a perl module to do calculations with money? There's Math::Currency. - Perrin
Re: mixing cgi-bin mod_perl
He wants to mix cgi-bin mod_perl by testing all of the scripts in cgi-bin and putting one cgi-script at a time into mod-perl folder. A very simple way to do this is to use Location directives to add them to PerlRun one at a time: Location /cgi-bin/some_scr.pl SetHandler perl-script PerlHandler Apache::PerlRun Options +ExecCGI #optional PerlSendHeader On ... /Location Location /cgi-bin/some_other_scr.pl SetHandler perl-script PerlHandler Apache::PerlRun Options +ExecCGI #optional PerlSendHeader On ... /Location These directives will override the broader directives for /cgi-bin/. You could use mod_macro (or Perl sections) to avoid all the duplicated typing. - Perrin
Re: What phase am I in?
I've looked through the mod_perl docs and guide and am unable to find something that I can use in a handler to figure out what the current phase is. This seems like such an obvious thing that I can't believe it doesn't exist. Therefore I will conclude that I'm completely blind. Anyone care to open my eyes? http://mathforum.org/epigone/modperl/liphortwa/Pine.LNX.4.10.9909211217510.5 [EMAIL PROTECTED] It's called current_callback(). - Perrin
Re: Tips tricks needed :)
2. We will use Template-Toolkit and Apache/mod_perl. Problem is that 2 out of 3 people have never used TT or programmed mod_perl and OO Perl. Only I've made sites this way, they've used Embperl til now. How can I make this switch for them a little easier? Get them all copies of the Eagle book and Damian's OO book (or the Advanced Perl Programming book, which I also like for OO). Have them read at least the porting and traps documentation that comes with mod_perl, if not the whole Guide. Decide if you will be using a framework like OpenInteract or OpenFrame. If not, decide on a basic structure for request handling, error handling, and parameter parsing, and document it. We made this same transition at eToys, and you can do it. Learning to use TT is pretty quick, but OO and mod_perl require more hand-holding and are much easier if you have someone around with some confidence in how to use them. I'm including a brief outline that I used when I gave a talk about mod_perl. It's slightly out of date, but it shows what I considered to be the highlights for mod_perl newbies. Which data should I send with cookie? Only some random key which is also stored in dbase and this key is used to find real data from dbase? Yes, use a unique key with some kind of MAC so you can be sure it isn't forged when it's sent back to you. See the recent thread and the AuthCookie module. Note that some things still have to passed as query args or hidden fields. This would be anything that could be happening in multiple windows, like search terms (if you tied them to the cookie, searches in multiple windows would interfere with each other). 4. How is most reasonable to store(and use too) complex formulas and coefficients? I would use an OO approach with polymorphism, like others have described. - Perrin What is mod_perl? An Apache module. An embedded Perl interpreter. Fast. Flexible. What's wrong with CGI? Forking Perl on every request. Compiling the script every time. Opening a new database connection every time. Apache::DBI. The Apache request lifecycle. See diagram. The Apache API. See code listing. $r $r->headers_in->{'User-Agent'} $r->parsed_uri $r->args The config file. How mod_perl emulates CGI. Apache::Registry Apache::PerlRun What do you need to know as a programmer? Memory management. use strict! use strict! use strict! Pre-loading and copy-on-write. See diagram. startup.pl Imported symbols and use vars. Traps Global settings: $/, $^T, $^W, $| Using them safely with local. -M test implications. use, require, do, and %INC Compiled regular expressions. See the guide or Perl Cookbook. BEGIN and END blocks. Performance tweaks. Apache::Request for query string. See listing. Apache::Cookie for cookies. See listing. Apache::Util for escaping HTML and URIs See benchmark. DBI: prepare_cached and finish Reloading your code changes. Restart the server. Apache::StatINC. Stonehenge::Reload. PerlFreshRestart. Memory considerations for live servers. Profiling Debugging Tricks to use in production. Apache::SizeLimit. Proxy server setup. Resources Read the guide! - http://perl.apache.org/guide/ PerlMonth - http://perlmonth.com/ perldoc everything The Eagle Book and http://www.modperl.com the mailing list Comparison to other systems?
Re: Tips tricks needed :)
Actually I was wondering about writing an Apache::Singleton class, that works the same as Class::Singleton, but clears the singleton out on each request (by using pnotes). Would anyone be interested in that? This sounds a bit like Object::Registrar. If you do it, I'd suggest giving it a scope option for each variable that determines if it's process or request scope. In fact you could add support for some kind of data-sharing and offer a server scope as well. - Perrin
Re: Tips tricks needed :)
ALWAYS reinitialize $Your::Singleton::ETERNAL on each query! mod_perl will *NOT* do it for you. If you want a per-request global, use $r-pnotes() instead of a standard perl global. Then mod_perl *WILL* do it for you. You might think 'ah yeah but it would be nice if $Your::Singleton::ETERNAL could be persistent across queries...' which is sometimes desirable, but remember that if you have multiple instances of your application running on the same apache, $Your::Singleton::ETERNAL will be common to ALL of them. It will be common to all requests in that particular process, but not shared between multiple Apache processes. If you take requests for different applications that need different singletons on the same Apache process, you should separate them by namespace so they don't collide. - Perrin
Re: Tips tricks needed :)
No, it's nothing like Object::Registrar. It's like Class::Singleton. Okay, wishful thinking. I don't use Class::Singleton, but I have written my own versions of Object::Registrar a few times to accomplish the same goal. I don't like to make my core classes dependent on running in a mod_perl environment if I can avoid it, so I prefer to use a separate registry approach that keeps $r-pnotes() out of my classes. It's also nice to be able to quickly adapt other people's classes in this way without changing their code to use Class::Singleton. - Perrin
Re: Tips tricks needed :)
One thing I don't quite understand is the need to clear out a singleton. Why would a singleton need to hold transient state? It's good for holding something request-specific, like a user session.
Re: Tips tricks needed :)
If you want a per-request global, use $r-pnotes() instead of a standard perl global. Then mod_perl *WILL* do it for you. True. But then you are using the Apache object and you're program doesn't work as a standard CGI anymore :( I handle this by chekcing for $ENV{MOD_PERL} and just using a global for storage if it isn't defined. Separating by namespace is not very convenient though. What I have been doing to get around this is that I wrote a simple module that can be used as a global scalar and that uses tie to return appropriate variable (FYI I've attached the module, if that interests anyone). That's a good idea. I've done similar things with accessor methods instead of tied scalars. - Perrin
Re: modperl questions
as it stands, the cgi structure looks like this https://www.foo.co.za/cgi-bin/client1/index.pl https://www.foo.co.za/cgi-bin/client2/index.pl it would be better if it was https://www.foo.co.za/client1 https://www.foo.co.za/client2 You can just use this in your httpd.conf: DirectoryIndex index.pl Now that we have multiple application and databse instances running concurrently, how do we ensure that filehandle and Apache::DBI symbols are reliably encapsulated in their own namespsaces, all running off the same codebase ? Apache::DBI will only give you back a cached connection if your connecion parameters (user, login, etc.) are exactly the same. If different clients connect to different databases, this should be fine. You won't accidentally get the wrong one. As for filehandles, I suggest you use lexically scoped ones if possible. - Perrin
Re: submit-data and chained handlers
Apache::RequestNotes don't work because Apache::Registry expect to read the POST/PUT-data from STDIN. It's important that the cgi-scripts run unmodified and without any notice of their unnaturally environment. I don't think there's any way around the fact that you can only read the content once. That means you need to read and store it for other handlers to use, which is what Apache::RequestNotes does. Alternatively, you could add something to your Registry script that stuffs the parsed values into pnotes yourself after using them. If you put that inside a block that checks for $ENV{'MOD_PERL'}, you'll still be able to run the script safely under standard CGI. It also looks like you're re-inventing Apache::Filter or Apache::OutputChain. Have you tried them? - Perrin
Re: Comparison of different caching schemes
One place that Rob and I still haven't found a good solution for profiling is trying to work out whether we should be focussing on optimising our mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup, or our daemons' code, or... Assuming that the mod_perl app is the front-end for users and that you're trying to optimize for speed of responses, you should just use DProf to tell you which subroutines are using the most wall clock time. (I think it's dprofpp -t or something. Check the man page.) If the sub that uses IMAP is the slowest, then work on speeding up your IMAP server or the way you access it. CPU utilization may not be all that telling, since database stuff often takes the longest but doesn't burn much CPU. - Perrin
Re: load balancing on apache
I am planning to host an application and its size is going to be big one , so expect the concurrent number of connection s to be around 2200. To combat the same , want to perform load sharing on 3-4 servers. If you really expect 2200 concurrent connections, you should buy dedicated load-balancing hardware like Big/IP or Cisco LocalDirector. - Perrin
Re: load balancing on apache
Aside from the fact I _really_ wouldn't expect that manny actual, live TCP connections at one time... Nor would I, although we did see huge numbers of open connections during peak times at eToys. Mostly to the image serving machines though. I _really_ hate so-called dedicated boxes. They're closed, nasty, inflexible and often don't work in _your_ situation. Doing smart session-based redirection can be hard with these boxes. You can make it work with homegrown solutions, but I've found the dedicated load-balancing tools (at least Big/IP) to be effective and fairly easy to work with, even with large loads, failover requirements, and more exotic stuff like sticky sessions. This is one area where the problem seems to be well enough defined for most people to use an off-the-shelf solution. They're often more expensive than they should be, but if you don't have someone on hand who knows the ipchains or LVS stuff it can save you some time and trouble. - Perrin
Re: Comparison of different caching schemes
I was using Cache::SharedMemoryCache on my system. I figured, Hey, it's RAM, right? It's gonna be WAY faster than anything disk-based. The thing you were missing is that on an OS with an aggressively caching filesystem (like Linux), frequently read files will end up cached in RAM anyway. The kernel can usually do a better job of managing an efficient cache than your program can. For what it's worth, DeWitt Clinton accompanied his first release of File::Cache (the precursor to Cache::FileCache) with a benchmark showing this same thing. That was the reason File::Cache was created. And ++ on Paul's comments about Devel::DProf and other profilers. - Perrin
Re: mod_perl vs. C for high performance Apache modules
I spoke to the technical lead at Yahoo who said mod_perl will not scale as well as c++ when you get to their level of traffic, but for a large ecommerce site mod_perl is fine. According to something I once read by David Filo, Yahoo also had to tweak the FreeBSD code because they had trouble scaling *TCP/IP*! I would say their experience is not typical. - Perrin
Re: mod_perl vs. C for high performance Apache modules
So I'm trying to show that mod_perl doesn't suck, and that it is, in fact, a reasonable choice. Though within these limits it is still reasonable to point out the development cycle, emotionally it is the least compelling form of argument, because the investor has a hard time removing from consideration that given our particular situation, there was a very fast solution in using his C-based routines. Well, that is the primary reason for using Perl over C, and you really have to count maintenance and the relative likelihood of C-ish bugs like buffer overflows as part of it. Well-coded C should be faster than Perl, but Perl is fast enough for nearly any web-based application. If this guy saw CPU spikes, he probably had something else wrong, like running out of memory. You might find this article aboout C and Perl performance useful: http://www.perl.com/pub/a/2001/06/27/ctoperl.html - Perrin
Re: Comparison of different caching schemes
So our solution was caching in-process with just a hash, and using a DBI/mysql persistent store. in pseudo code sub get_stuff { if (! $cache{$whatever} ) { if !( $cache{whatever} = dbi_lookup()) { $cache{$whatever}=derive_data_from_original_source($whatever); dbi_save($cache_whatever); } } $cache{$whatever} } That's actually a bit different. That would fail to notice updates between processes until the in-memory cache was cleared. Still very useful for read-only data or data that can be out of sync for some period though. The filesystem based / time sensitive aging is a nice little thing we should put up in CPAN. We've just not done so yet. How does it differ from the other solutions like Cache::FileCache? Is it something you could add to an existing module? - Perrin
Re: Apache::SizeLimit Exit is Delayed
Some bug report about Apache::SizeLimit diagnostic: I don't know about Linux and Solaris but under FreeBSD shared memory shows some incredible numbers: Okay, I'll ask the guy who wrote the *BSD support to look into it. I don't have a FreeBSD system to test with. And some recomendation - I'm using Apache::SizeLimit as PerlCleanupHandler - so Apache would exit after request is completed. You should use it in an early phase, like PerlFixupHandler. It pushes a cleanup handler if it needs to exit. It will not exit until after the request is done. I'm using mod_class for mod_perl size hard control during reponse phase. I've never heard of mod_class. Do you have a link for it? My official recommendation is to set Apache::SizeLimit up with low enough numbers that you can handle it if it grows a little more during the final request, and use Apache::Resource as a hard limit to prevent runaway processes from eating up all your RAM. Apache::Resource will kill your process even if it's in the middle of a response, so you don't want to use it for normal size control. - Perrin
Re: Apache::SizeLimit Exit is Delayed
Perrin Harkins wrote: Try changing the call $r-child_terminate() to Apache::exit(). If this seems to work better for you, let me know and I'll consider changing this in a future release of Apache::SizeLimit. Geoff wrote: what about $r-headers_out-add(Connection = 'close'); I tried each of these changes in turn. Neither worked to immediately exit the child. I never saw that either of them would exit the child at all but I may not have kept them running long enough. Did you see more requests being handled by the same process? I noticed an odd pattern of behavior. With one of our cgi scripts, and using $r-child_terminate(), the child would always exit immediately. With other scripts, it wouldn't exit. With both Perrin's and Geoff's suggestions from above, that same script would cause the process being used to be changed, but the old process wouldn't exit. You mean the old process hangs around, but doesn't take any new requests? - Perrin
Re: Apache::SizeLimit Exit is Delayed
You should use it in an early phase, like PerlFixupHandler. It pushes a cleanup handler if it needs to exit. It will not exit until after the request is done. I didn't know it. I think you should document it. But any way I think it's better to check size in cleanup. I agree and I plan to change this. It's BSD specific module. It allow to set login class for process to limit memory or time usage. How about sending Stas a patch for the guide with information on this? It might be useful to other BSD'ers. - Perrin