stas 02/01/05 11:20:01 Added: src/docs/1.0/api Apache.pod config.cfg src/docs/1.0/faqs cgi_to_mod_perl.pod config.cfg email-etiquette.pod mjtg-news.txt mod_perl.pod mod_perl_api.pod mod_perl_cgi.pod mod_perl_cvs.pod mod_perl_faq.pod mod_perl_method_handlers.pod mod_perl_traps.pod mod_perl_tuning.pod perl_myth.pod src/docs/1.0/win32 config.cfg win32_binaries.pod win32_compile.pod win32_multithread.pod Log: porting mod_perl 1.x documents Revision Changes Path 1.1 modperl-docs/src/docs/1.0/api/Apache.pod Index: Apache.pod =================================================================== =head1 NAME Apache - Perl interface to the Apache server API =head1 SYNOPSIS use Apache (); =head1 DESCRIPTION This module provides a Perl interface the Apache API. It is here mainly for B<mod_perl>, but may be used for other Apache modules that wish to embed a Perl interpreter. We suggest that you also consult the description of the Apache C API at http://www.apache.org/docs/. =head1 THE REQUEST OBJECT The request object holds all the information that the server needs to service a request. Apache B<Perl*Handler>s will be given a reference to the request object as parameter and may choose to update or use it in various ways. Most of the methods described below obtain information from or update the request object. The perl version of the request object will be blessed into the B<Apache> package, it is really a C<request_rec*> in disguise. =over 4 =item Apache-E<gt>request([$r]) The Apache-E<gt>request method will return a reference to the request object. B<Perl*Handler>s can obtain a reference to the request object when it is passed to them via C<@_>. However, scripts that run under B<Apache::Registry>, for example, need a way to access the request object. B<Apache::Registry> will make a request object available to these scripts by passing an object reference to C<Apache-E<gt>request($r)>. If handlers use modules such as B<CGI::Apache> that need to access C<Apache-E<gt>request>, they too should do this (e.g. B<Apache::Status>). =item $r-E<gt>as_string Returns a string representation of the request object. Mainly useful for debugging. =item $r-E<gt>main If the current request is a sub-request, this method returns a blessed reference to the main request structure. If the current request is the main request, then this method returns C<undef>. =item $r-E<gt>prev This method returns a blessed reference to the previous (internal) request structure or C<undef> if there is no previous request. =item $r-E<gt>next This method returns a blessed reference to the next (internal) request structure or C<undef> if there is no next request. =item $r-E<gt>last This method returns a blessed reference to the last (internal) request structure. Handy for logging modules. =item $r-E<gt>is_main Returns true if the current request object is for the main request. (Should give the same result as C<!$r-E<gt>main>, but will be more efficient.) =item $r-E<gt>is_initial_req Returns true if the current request is the first internal request, returns false if the request is a sub-request or internal redirect. =item $r-E<gt>allowed($bitmask) Get or set the allowed methods bitmask. This allowed bitmask should be set whenever a 405 (method not allowed) or 501 (method not implemented) answer is returned. The bit corresponding to the method number should be et. unless ($r->method_number == M_GET) { $r->allowed($r->allowed | (1<<M_GET) | (1<<M_HEAD) | (1<<M_OPTIONS)); return HTTP_METHOD_NOT_ALLOWED; } =back =head1 SUB REQUESTS Apache provides a sub-request mechanism to lookup a uri or filename, performing all access checks, etc., without actually running the response phase of the given request. Notice, we have dropped the C<sub_req_> prefix here. The C<request_rec*> returned by the lookup methods is blessed into the B<Apache::SubRequest> class. This way, C<destroy_sub_request()> is called automatically during C<Apache::SubRequest-E<gt>DESTROY> when the object goes out of scope. The B<Apache::SubRequest> class inherits all the methods from the B<Apache> class. =over 4 =item $r-E<gt>lookup_uri($uri) my $subr = $r->lookup_uri($uri); my $filename = $subr->filename; unless(-e $filename) { warn "can't stat $filename!\n"; } =item $r-E<gt>lookup_file($filename) my $subr = $r->lookup_file($filename); =item $subr-E<gt>run if($subr->run != OK) { $subr->log_error("something went wrong!"); } =back =head1 CLIENT REQUEST PARAMETERS In this section we will take a look at various methods that can be used to retrieve the request parameters sent from the client. In the following examples, B<$r> is a request object blessed into the B<Apache> class, obtained by the first parameter passed to a handler subroutine or I<Apache-E<gt>request> =over 4 =item $r-E<gt>method( [$meth] ) The $r-E<gt>method method will return the request method. It will be a string such as "GET", "HEAD" or "POST". Passing an argument will set the method, mainly used for internal redirects. =item $r-E<gt>method_number( [$num] ) The $r-E<gt>method_number method will return the request method number. The method numbers are defined by the M_GET, M_POST,... constants available from the B<Apache::Constants> module. Passing an argument will set the method_number, mainly used for internal redirects and testing authorization restriction masks. =item $r-E<gt>bytes_sent The number of bytes sent to the client, handy for logging, etc. =item $r-E<gt>the_request The request line sent by the client, handy for logging, etc. =item $r-E<gt>proxyreq Returns true if the request is proxy http. Mainly used during the filename translation stage of the request, which may be handled by a C<PerlTransHandler>. =item $r-E<gt>header_only Returns true if the client is asking for headers only, e.g. if the request method was B<HEAD>. =item $r-E<gt>protocol The $r-E<gt>protocol method will return a string identifying the protocol that the client speaks. Typical values will be "HTTP/1.0" or "HTTP/1.1". =item $r-E<gt>hostname Returns the server host name, as set by full URI or Host: header. =item $r-E<gt>request_time Returns the time that the request was made. The time is the local unix time in seconds since the epoch. =item $r-E<gt>uri( [$uri] ) The $r-E<gt>uri method will return the requested URI minus optional query string, optionally changing it with the first argument. =item $r-E<gt>filename( [$filename] ) The $r-E<gt>filename method will return the result of the I<URI --E<gt> filename> translation, optionally changing it with the first argument if you happen to be doing the translation. =item $r-E<gt>path_info( [$path_info] ) The $r-E<gt>path_info method will return what is left in the path after the I<URI --E<gt> filename> translation, optionally changing it with the first argument if you happen to be doing the translation. =item $r-E<gt>args( [$query_string] ) The $r-E<gt>args method will return the contents of the URI I<query string>. When called in a scalar context, the entire string is returned. When called in a list context, a list of parsed I<key> =E<gt> I<value> pairs are returned, i.e. it can be used like this: $query = $r->args; %in = $r->args; $r-E<gt>args can also be used to set the I<query string>. This can be useful when redirecting a POST request. =item $r-E<gt>headers_in The $r-E<gt>headers_in method will return a %hash of client request headers. This can be used to initialize a perl hash, or one could use the $r-E<gt>header_in() method (described below) to retrieve a specific header value directly. Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. This requires I<Apache::Table>. =item $r-E<gt>header_in( $header_name, [$value] ) Return the value of a client header. Can be used like this: $ct = $r->header_in("Content-type"); $r->header_in($key, $val); #set the value of header '$key' =item $r-E<gt>content The $r-E<gt>content method will return the entity body read from the client, but only if the request content type is C<application/x-www-form-urlencoded>. When called in a scalar context, the entire string is returned. When called in a list context, a list of parsed I<key> =E<gt> I<value> pairs are returned. *NOTE*: you can only ask for this once, as the entire body is read from the client. =item $r-E<gt>read($buf, $bytes_to_read, [$offset]) This method is used to read data from the client, looping until it gets all of C<$bytes_to_read> or a timeout happens. An offset may be specified to place the read data at some other place than the beginning of the string. In addition, this method sets a timeout before reading with C<$r-E<gt>soft_timeout>. =item $r-E<gt>get_remote_host Lookup the client's DNS hostname. If the configuration directive B<HostNameLookups> is set to off, this returns the dotted decimal representation of the client's IP address instead. Might return I<undef> if the hostname is not known. =item $r-E<gt>get_remote_logname Lookup the remote user's system name. Might return I<undef> if the remote system is not running an RFC 1413 server or if the configuration directive B<IdentityCheck> is not turned on. =back More information about the client can be obtained from the B<Apache::Connection> object, as described below. =over 4 =item $c = $r-E<gt>connection The $r-E<gt>connection method will return a reference to the request connection object (blessed into the B<Apache::Connection> package). This is really a C<conn_rec*> in disguise. The following methods can be used on the connection object: =over 4 =item $c-E<gt>remote_host If the configuration directive B<HostNameLookups> is set to on: then the first time C<$r-E<gt>get_remote_host> is called the server does a DNS lookup to get the remote client's host name. The result is cached in C<$c-E<gt>remote_host> then returned. If the server was unable to resolve the remote client's host name this will be set to "". Subsequent calls to C<$r-E<gt>get_remote_host> return this cached value. If the configuration directive B<HostNameLookups> is set to off: calls to C<$r-E<gt>get_remote_host> return a string that contains the dotted decimal representation of the remote client's IP address. However this string is not cached, and C<$c-E<gt>remote_host> is undefined. So, it's best to to call C<$r-E<gt>get_remote_host> instead of directly accessing this variable. =item $c-E<gt>remote_ip The dotted decimal representation of the remote client's IP address. This is set by the server when the connection record is created so is always defined. You can also set this value by providing an argument to it. This is helpful if your server is behind a squid accelerator proxy which adds a X-Forwarded-For header. =item $c-E<gt>local_addr A packed SOCKADDR_IN in the same format as returned by L<Socket/pack_sockaddr_in>, containing the port and address on the local host that the remote client is connected to. This is set by the server when the connection record is created so it is always defined. =item $c-E<gt>remote_addr A packed SOCKADDR_IN in the same format as returned by L<Socket/pack_sockaddr_in>, containing the port and address on the remote host that the server is connected to. This is set by the server when the connection record is created so it is always defined. Among other things, this can be used, together with C<$c-E<gt>local_addr>, to perform RFC1413 ident lookups on the remote client even when the configuration directive B<IdentityCheck> is turned off. Can be used like: use Net::Ident qw (lookupFromInAddr); ... my $remoteuser = lookupFromInAddr ($c->local_addr, $c->remote_addr, 2); Note that the lookupFromInAddr interface does not currently exist in the B<Net::Ident> module, but the author is planning on adding it soon. =item $c-E<gt>remote_logname If the configuration directive B<IdentityCheck> is set to on: then the first time C<$r-E<gt>get_remote_logname> is called the server does an RFC 1413 (ident) lookup to get the remote users system name. Generally for UNI* systems this is their login. The result is cached in C<$c-E<gt>remote_logname> then returned. Subsequent calls to C<$r-E<gt>get_remote_host> return the cached value. If the configuration directive B<IdentityCheck> is set to off: then C<$r-E<gt>get_remote_logname> does nothing and C<$c-E<gt>remote_logname> is always undefined. =item $c-E<gt>user( [$user] ) If an authentication check was successful, the authentication handler caches the user name here. Sets the user name to the optional first argument. =item $c-E<gt>auth_type Returns the authentication scheme that successfully authenticate C<$c-E<gt>user>, if any. =item $c-E<gt>aborted Returns true if the client stopped talking to us. =item $c-E<gt>fileno( [$direction] ) Returns the client file descriptor. If $direction is 0, the input fd is returned. If $direction is not null or ommitted, the output fd is returned. This can be used to detect client disconnect without doing any I/O, e.g. using IO::Select. =back =back =head1 SERVER CONFIGURATION INFORMATION The following methods are used to obtain information from server configuration and access control files. =over 4 =item $r-E<gt>dir_config( $key ) Returns the value of a per-directory variable specified by the C<PerlSetVar> directive. # <Location /foo/bar> # PerlSetVar Key Value # </Location> my $val = $r->dir_config('Key'); Keys are case-insensitive. Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. See I<Apache::Table>. =item $r-E<gt>dir_config-E<gt>get( $key ) Returns the value of a per-directory array variable specified by the C<PerlAddVar> directive. # <Location /foo/bar> # PerlAddVar Key Value1 # PerlAddVar Key Value2 # </Location> my @val = $r->dir_config->get('Key'); Alternatively in your code you can extend the setting with: $r->dir_config->add(Key => 'Value3'); Keys are case-insensitive. Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. See I<Apache::Table>. =item $r-E<gt>requires Returns an array reference of hash references, containing information related to the B<require> directive. This is normally used for access control, see L<Apache::AuthzAge> for an example. =item $r-E<gt>auth_type Returns a reference to the current value of the per directory configuration directive B<AuthType>. Normally this would be set to C<Basic> to use the basic authentication scheme defined in RFC 1945, I<Hypertext Transfer Protocol -- HTTP/1.0>. However, you could set to something else and implement your own authentication scheme. =item $r-E<gt>auth_name Returns a reference to the current value of the per directory configuration directive B<AuthName>. The AuthName directive creates protection realm within the server document space. To quote RFC 1945 "These realms allow the protected resources on a server to be partitioned into a set of protection spaces, each with its own authentication scheme and/or authorization database." The client uses the root URL of the server to determine which authentication credentials to send with each HTTP request. These credentials are tagged with the name of the authentication realm that created them. Then during the authentication stage the server uses the current authentication realm, from C<$r-E<gt>auth_name>, to determine which set of credentials to authenticate. =item $r-E<gt>document_root ( [$docroot] ) When called with no argument, returns a reference to the current value of the per server configuration directive B<DocumentRoot>. To quote the Apache server documentation, "Unless matched by a directive like Alias, the server appends the path from the requested URL to the document root to make the path to the document." This same value is passed to CGI scripts in the C<DOCUMENT_ROOT> environment variable. You can also set this value by providing an argument to it. The following example dynamically sets the document root based on the request's "Host:" header: sub trans_handler { my $r = shift; my ($user) = ($r->header_in('Host') =~ /^[^\.]+/); $r->document_root("/home/$user/www"); return DECLINED; } PerlTransHandler trans_handler =item $r-E<gt>allow_options The C<$r-E<gt>allow_options> method can be used for checking if it is OK to run a perl script. The B<Apache::Options> module provides the constants to check against. if(!($r->allow_options & OPT_EXECCGI)) { $r->log_reason("Options ExecCGI is off in this directory", $filename); } =item $r-E<gt>get_server_port Returns the port number on which the server is listening. =item $s = $r-E<gt>server Return a reference to the server info object (blessed into the B<Apache::Server> package). This is really a C<server_rec*> in disguise. The following methods can be used on the server object: =item $s = Apache-E<gt>server Same as above, but only available during server startup for use in C<E<lt>PerlE<gt>> sections, B<PerlScript> or B<PerlModule>. =item $s-E<gt>server_admin Returns the mail address of the person responsible for this server. =item $s-E<gt>server_hostname Returns the hostname used by this server. =item $s-E<gt>port Returns the port that this servers listens too. =item $s-E<gt>is_virtual Returns true if this is a virtual server. =item $s-E<gt>names Returns the wild-carded names for ServerAlias servers. =item $s-E<gt>dir_config( $key ) Alias for Apache::dir_config. =item $s-E<gt>warn Alias for Apache::warn. =item $s-E<gt>log_error Alias for Apache::log_error. =item $s-E<gt>uid Returns the numeric user id under which the server answers requests. This is the value of the User directive. =item $s-E<gt>gid Returns the numeric group id under which the server answers requests. This is the value of the Group directive. =item $s-E<gt>loglevel Get or set the value of the current LogLevel. This method is added by the Apache::Log module, which needs to be pulled in. use Apache::Log; print "LogLevel = ", $s->loglevel; $s->loglevel(Apache::Log::DEBUG); If using Perl 5.005+, the following constants are defined (but not exported): Apache::Log::EMERG Apache::Log::ALERT Apache::Log::CRIT Apache::Log::ERR Apache::Log::WARNING Apache::Log::NOTICE Apache::Log::INFO Apache::Log::DEBUG =item $r-E<gt>get_handlers( $hook ) Returns a reference to a list of handlers enabled for $hook. $hook is a string representing the phase to handle. The returned list is a list of references to the handler subroutines. $list = $r->get_handlers( 'PerlHandler' ); =item $r-E<gt>set_handlers( $hook, [\&handler, ... ] ) Sets the list if handlers to be called for $hook. $hook is a string representing the phase to handle. The list of handlers is an anonymous array of code references to the handlers to install for this request phase. The special list [ \&OK ] can be used to disable a particular phase. $r->set_handlers( PerlLogHandler => [ \&myhandler1, \&myhandler2 ] ); $r->set_handlers( PerlAuthenHandler => [ \&OK ] ); =item $r-E<gt>push_handlers( $hook, \&handler ) Pushes a new handler to be called for $hook. $hook is a string representing the phase to handle. The handler is a reference to a subroutine to install for this request phase. This handler will be called before any configured handlers. $r->push_handlers( PerlHandler => \&footer); =item $r-E<gt>current_callback Returns the name of the handler currently being run. This method is most useful to PerlDispatchHandlers who wish to only take action for certain phases. if($r->current_callback eq "PerlLogHandler") { $r->warn("Logging request"); } =back =head1 SETTING UP THE RESPONSE The following methods are used to set up and return the response back to the client. This typically involves setting up $r-E<gt>status(), the various content attributes and optionally some additional $r-E<gt>header_out() calls before calling $r-E<gt>send_http_header() which will actually send the headers to the client. After this a typical application will call the $r-E<gt>print() method to send the response content to the client. =over 4 =item $r-E<gt>send_http_header( [$content_type] ) Send the response line and all headers to the client. Takes an optional parameter indicating the content-type of the response, i.e. 'text/html'. This method will create headers from the $r-E<gt>content_xxx() and $r-E<gt>no_cache() attributes (described below) and then append the headers defined by $r-E<gt>header_out (or $r-E<gt>err_header_out if status indicates an error). =item $r-E<gt>get_basic_auth_pw If the current request is protected by Basic authentication, this method will return 0, otherwise -1. The second return value will be the decoded password sent by the client. ($ret, $sent_pw) = $r->get_basic_auth_pw; =item $r-E<gt>note_basic_auth_failure Prior to requiring Basic authentication from the client, this method will set the outgoing HTTP headers asking the client to authenticate for the realm defined by the configuration directive C<AuthName>. =item $r-E<gt>handler( [$meth] ) Set the handler for a request. Normally set by the configuration directive C<AddHandler>. $r->handler( "perl-script" ); =item $r-E<gt>notes( $key, [$value] ) Return the value of a named entry in the Apache C<notes> table, or optionally set the value of a named entry. This table is used by Apache modules to pass messages amongst themselves. Generally if you are writing handlers in mod_perl you can use Perl variables for this. $r->notes("MY_HANDLER" => OK); $val = $r->notes("MY_HANDLER"); Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. This requires I<Apache::Table>. =item $r-E<gt>pnotes( $key, [$value] ) Like $r-E<gt>notes, but takes any scalar as an value. $r->pnotes("MY_HANDLER" => [qw(one two)]); my $val = $r->pnotes("MY_HANDLER"); print $val->[0]; # prints "one" Advantage over just using a Perl variable is that $r-E<gt>pnotes gets cleaned up after every request. =item $r-E<gt>subprocess_env( $key, [$value] ) Return the value of a named entry in the Apache C<subprocess_env> table, or optionally set the value of a named entry. This table is used by mod_include. By setting some custom variables inside a perl handler it is possible to combine perl with mod_include nicely. If you say, e.g. in a PerlHeaderParserHandler $r->subprocess_env(MyLanguage => "de"); you can then write in your .shtml document: <!--#if expr="$MyLanguage = en" --> English <!--#elif expr="$MyLanguage = de" --> Deutsch <!--#else --> Sorry <!--#endif --> Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. This requires I<Apache::Table>. =item $r-E<gt>content_type( [$newval] ) Get or set the content type being sent to the client. Content types are strings like "text/plain", "text/html" or "image/gif". This corresponds to the "Content-Type" header in the HTTP protocol. Example of usage is: $previous_type = $r->content_type; $r->content_type("text/plain"); =item $r-E<gt>content_encoding( [$newval] ) Get or set the content encoding. Content encodings are string like "gzip" or "compress". This correspond to the "Content-Encoding" header in the HTTP protocol. =item $r-E<gt>content_languages( [$array_ref] ) Get or set the content languages. The content language corresponds to the "Content-Language" HTTP header and is an array reference containing strings such as "en" or "no". =item $r-E<gt>status( $integer ) Get or set the reply status for the client request. The B<Apache::Constants> module provide mnemonic names for the status codes. =item $r-E<gt>status_line( $string ) Get or set the response status line. The status line is a string like "200 Document follows" and it will take precedence over the value specified using the $r-E<gt>status() described above. =item $r-E<gt>headers_out The $r-E<gt>headers_out method will return a %hash of server response headers. This can be used to initialize a perl hash, or one could use the $r-E<gt>header_out() method (described below) to retrieve or set a specific header value directly. Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. This requires I<Apache::Table>. =item $r-E<gt>header_out( $header, $value ) Change the value of a response header, or create a new one. You should not define any "Content-XXX" headers by calling this method, because these headers use their own specific methods. Example of use: $r->header_out("WWW-Authenticate" => "Basic"); $val = $r->header_out($key); =item $r-E<gt>err_headers_out The $r-E<gt>err_headers_out method will return a %hash of server response headers. This can be used to initialize a perl hash, or one could use the $r-E<gt>err_header_out() method (described below) to retrieve or set a specific header value directly. The difference between headers_out and err_headers_out is that the latter are printed even on error, and persist across internal redirects (so the headers printed for ErrorDocument handlers will have them). Will return a I<HASH> reference blessed into the I<Apache::Table> class when called in a scalar context with no "key" argument. This requires I<Apache::Table>. =item $r-E<gt>err_header_out( $header, [$value] ) Change the value of an error response header, or create a new one. These headers are used if the status indicates an error. $r->err_header_out("Warning" => "Bad luck"); $val = $r->err_header_out($key); =item $r-E<gt>no_cache( $boolean ) This is a flag that indicates that the data being returned is volatile and the client should be told not to cache it. C<$r-E<gt>no_cache(1)> adds the headers "Pragma: no-cache" and "Cache-control: no-cache" to the reponse, therefore it must be called before C<$r-E<gt>send_http_header>. =item $r-E<gt>print( @list ) This method sends data to the client with C<$r-E<gt>write_client>, but first sets a timeout before sending with C<$r-E<gt>soft_timeout>. This method is called instead of CORE::print when you use print() in your mod_perl programs. This method treats scalar references specially. If an item in @list is a scalar reference, it will be dereferenced before printing. This is a performance optimization which prevents unneeded copying of large strings, and it is subtly different from Perl's standard print() behavior. Example: $foo = \"bar"; print($foo); The result is "bar", not the "SCALAR(0xDEADBEEF)" you might have expected. If you really want the reference to be printed out, force it into a scalar context by using C<print(scalar($foo))>. =item $r-E<gt>send_fd( $filehandle ) Send the contents of a file to the client. Can for instance be used like this: open(FILE, $r->filename) || return 404; $r->send_fd(FILE); close(FILE); =item $r-E<gt>internal_redirect( $newplace ) Redirect to a location in the server namespace without telling the client. For instance: $r->internal_redirect("/home/sweet/home.html"); =item $r-E<gt>internal_redirect_handler( $newplace ) Same as I<internal_redirect>, but the I<handler> from C<$r> is preserved. =item $r-E<gt>custom_response($code, $uri) This method provides a hook into the B<ErrorDocument> mechanism, allowing you to configure a custom response for a given response code at request-time. Example: use Apache::Constants ':common'; sub handler { my($r) = @_; if($things_are_ok) { return OK; } #<Location $r->uri> #ErrorDocument 401 /error.html #</Location> $r->custom_response(AUTH_REQUIRED, "/error.html"); #can send a string too #<Location $r->uri> #ErrorDocument 401 "sorry, go away" #</Location> #$r->custom_response(AUTH_REQUIRED, "sorry, go away"); return AUTH_REQUIRED; } =back =head1 SERVER CORE FUNCTIONS =over 4 =item $r-E<gt>soft_timeout($message) =item $r-E<gt>hard_timeout($message) =item $r-E<gt>kill_timeout =item $r-E<gt>reset_timeout (Documentation borrowed from http_main.h) There are two functions which modules can call to trigger a timeout (with the per-virtual-server timeout duration); these are hard_timeout and soft_timeout. The difference between the two is what happens when the timeout expires (or earlier than that, if the client connection aborts) --- a soft_timeout just puts the connection to the client in an "aborted" state, which will cause http_protocol.c to stop trying to talk to the client, but otherwise allows the code to continue normally. hard_timeout(), by contrast, logs the request, and then aborts it completely --- longjmp()ing out to the accept() loop in http_main. Any resources tied into the request resource pool will be cleaned up; everything that is not will leak. soft_timeout() is recommended as a general rule, because it gives your code a chance to clean up. However, hard_timeout() may be the most convenient way of dealing with timeouts waiting for some external resource other than the client, if you can live with the restrictions. When a hard timeout is in scope, critical sections can be guarded with block_alarms() and unblock_alarms() --- these are declared in alloc.c because they are most often used in conjunction with routines to allocate something or other, to make sure that the cleanup does get registered before any alarm is allowed to happen which might require it to be cleaned up; they * are, however, implemented in http_main.c. kill_timeout() will disarm either variety of timeout. reset_timeout() resets the timeout in progress. =item $r-E<gt>post_connection($code_ref) =item $r-E<gt>register_cleanup($code_ref) Register a cleanup function which is called just before $r-E<gt>pool is destroyed. $r->register_cleanup(sub { my $r = shift; warn "registered cleanup called for ", $r->uri, "\n"; }); Cleanup functions registered in the parent process (before forking) will run once when the server is shut down: #PerlRequire startup.pl warn "parent pid is $$\n"; Apache->server->register_cleanup(sub { warn "server cleanup in $$\n"}); The I<post_connection> method is simply an alias for I<register_cleanup>, as this method may be used to run code after the client connection is closed, which may not be a I<cleanup>. =back =head1 CGI SUPPORT We also provide some methods that make it easier to support the CGI type of interface. =over 4 =item $r-E<gt>send_cgi_header() Take action on certain headers including I<Status:>, I<Location:> and I<Content-type:> just as mod_cgi does, then calls $r-E<gt>send_http_header(). Example of use: $r->send_cgi_header(<<EOT); Location: /foo/bar Content-type: text/html EOT =back =head1 ERROR LOGGING The following methods can be used to log errors. =over 4 =item $r-E<gt>log_reason($message, $file) The request failed, why?? Write a message to the server errorlog. $r->log_reason("Because I felt like it", $r->filename); =item $r-E<gt>log_error($message) Uh, oh. Write a message to the server errorlog. $r->log_error("Some text that goes in the error_log"); =item $r-E<gt>warn($message) For pre-1.3 versions of apache, this is just an alias for C<log_error>. With 1.3+ versions of apache, this message will only be send to the error_log if B<LogLevel> is set to B<warn> or higher. =back =head1 UTILITY FUNCTIONS =over 4 =item Apache::unescape_url($string) Handy function for unescapes. Use this one for filenames/paths. Use unescape_url_info for the result of submitted form data. =item Apache::unescape_url_info($string) Handy function for unescapes submitted form data. In opposite to unescape_url it translates the plus sign to space. =item Apache::perl_hook($hook) Returns true if the specified callback hook is enabled: for (qw(Access Authen Authz ChildInit Cleanup Fixup HeaderParser Init Log Trans Type)) { print "$_ hook enabled\n" if Apache::perl_hook($_); } =back =head1 GLOBAL VARIABLES =over 4 =item $Apache::Server::Starting Set to true when the server is starting. =item $Apache::Server::ReStarting Set to true when the server is starting. =back =head1 SEE ALSO perl(1), Apache::Constants(3), Apache::Registry(3), Apache::Debug(3), Apache::Options(3), CGI::Apache(3) Apache C API notes at C<http://www.apache.org/docs/> =head1 AUTHORS Perl interface to the Apache C API written by Doug MacEachern with contributions from Gisle Aas, Andreas Koenig, Eric Bartley, Rob Hartill, Gerald Richter, Salvador Ortiz and others. =cut 1.1 modperl-docs/src/docs/1.0/api/config.cfg Index: config.cfg =================================================================== use vars qw(@c); @c = ( id => 'api_v1', title => "mod_perl 1.0 API", abstract => 'This is only a partial API, the rest of the API documentation is in the source distribution', chapters => [ qw( Apache.pod ), ], ); 1.1 modperl-docs/src/docs/1.0/faqs/cgi_to_mod_perl.pod Index: cgi_to_mod_perl.pod =================================================================== =head1 NAME cgi_to_mod_perl - First steps needed to use mod_perl as a CGI replacement =head1 DESCRIPTION As the README and other mod_perl documents explain, mod_perl as a CGI replacement is only a small piece of what the package offers. However, it is the most popular use of mod_perl, this document is here so you can cut to the chase. =head1 INSTALLATION Read the INSTALL document, in most cases, nothing more is required than: perl Makefile.PL && make && make install =head1 CONFIGURATION For using mod_perl as a CGI replacement, the recommended configuration is as follows: Alias /perl/ /real/path/to/perl-scripts/ <Location /perl> SetHandler perl-script PerlHandler Apache::Registry Options +ExecCGI </Location> `Location' refers to the uri, not a directory, think of the above as <Location http://www.yourname.com/perl> Any files under that location (which live on your filesystem under /real/path/to/perl-scripts/), will be handled by the Apache::Registry module, which emulates the CGI environment. The file must exist and be executable, in addition, 'Options ExecCGI' must be turned on. If you wish to have mod_perl execute scripts in any location based on file extension, use a configuration like so: <Files ~ "\.pl$"> SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI </Files> Note that `ScriptAlias' does _not_ work for mod_perl. =head1 PORTING CGI SCRIPTS =over 4 =item I/O If you are using Perl 5.004 most CGI scripts can run under mod_perl untouched. If you're using 5.003, Perl's built-in C<read()> and C<print()> functions do not work as they do under CGI. If you're using CGI.pm, use C<$query-E<gt>print> instead of plain 'ol C<print()>. =item HEADERS By default, mod_perl does not send any headers by itself, however, you may wish to change this: PerlSendHeader On Now the response line and common headers will be sent as they are by mod_cgi. And, just as with mod_cgi, PerlSendHeader will not send a terminating newline, your script must send that itself, e.g.: print "Content-type: text/html\n\n"; If you're using CGI.pm and 'print $q-E<gt>header' you do _not_ need C<PerlSendHeader On>. =item NPH SCRIPTS To run a CGI `nph' script under mod_perl, simply add to your code: local $| = 1; If you normally set B<PerlSendHeader On>, add this to your httpd.conf: <Files */nph-*> PerlSendHeader Off </Files> =item PROGRAMMING PRACTICE CGI lets you get away with sloppy programming, mod_perl does not. Why? CGI scripts have the lifetime of a single HTTP request as a separate process. When the request is over, the process goes away and everything is cleaned up for you, e.g. globals variables, open files, etc. Scripts running under mod_perl have a longer lifetime, over several request, different scripts may be in the same process. This means you must clean up after yourself. You've heard: always 'use strict' and C<-w>!!! It's more important under mod_perl Perl than anywhere else, while it's not required, it B<strongly> recommended, it will save you more time in the long run. And, of course, clean scripts will still run under CGI! =item TRAPS See L<mod_perl_traps>. =back =head1 REPORTING PROBLEMS Read the L<SUPPORT> file. =head1 SEE ALSO Apache::PerlRun(3) 1.1 modperl-docs/src/docs/1.0/faqs/config.cfg Index: config.cfg =================================================================== use vars qw(@c); @c = ( id => 'faqs', title => "mod_perl FAQs", abstract => 'Miscellaneous mod_perl documentation', chapters => [ qw( cgi_to_mod_perl.pod mod_perl_cgi.pod mod_perl_cvs.pod mod_perl_faq.pod mod_perl.pod mod_perl_traps.pod mod_perl_tuning.pod mod_perl_api.pod mod_perl_method_handlers.pod perl_myth.pod email-etiquette.pod ), ], ); 1.1 modperl-docs/src/docs/1.0/faqs/email-etiquette.pod Index: email-etiquette.pod =================================================================== =head1 The mod_perl Mailing List Guidelines =for html <!-- email-etiquette: This version dated 21 October 2001. Please make changes to the .pod source and use pod2html to create the .html file, thanks. [EMAIL PROTECTED] --> Ninety percent of the questions asked on the List have already been asked before, and answers will be found at one of the links below. Before you post to the mod_perl List, please read the following. Hopefully it will save you (and everyone else) some time. Except where noted the language of all documents is English. =head1 What is mod_perl? http://perl.apache.org/guide/intro.html#What_is_mod_perl =head1 What you need to know to be able to use mod_perl You need to know about Apache, CGI and of course about Perl itself. This document explains where to find more information about these and related topics. If you already have Perl on your machine then it's likely that you already have all the Perl documentation. Try typing `perldoc perldoc' and `man perl'. =head1 How To Get Help With mod_perl Itself http://perl.apache.org/ is the mod_perl home, it has links for everything related to mod_perl. =head2 Documentation which comes with the distribution Read the documents which came with mod_perl, particularly the ones named INSTALL, README and SUPPORT. Also read the documents to which they refer. Read all the relevant documentation about your operating system, any tools you use such as compilers and databases, and about the Apache Web server. You will get a much better response from the mod_perl List if you can show that you have made the effort of reading the documentation. =head2 Other documentation There are dozens of references to many authoritative resources at http://perl.apache.org/guide/help.html =head1 How to get on (and off!) the mod_perl mailing list =head2 To Get On The List There are two stages to getting on the list. Firstly you have to send a mail message to: [EMAIL PROTECTED] and wait for receiving a response from the mail server with instructions to proceed. Secondly you have to do what it says in the instructions. After you are subscribed you will receive a messsage with lots of useful information about the List. Read it. Print it, even. Save a copy of it. You *can* get another copy of it, but then you'll feel silly. Traffic on the mod_perl list can be high at times, several hundred posts per week, so you might want to consider subscribing to the mod_perl digest list as an alternative to the mod_perl list. To do so, send an email to [EMAIL PROTECTED] instead. =head2 To Get Off The List Instructions on how to unsubscribe are posted in the headers of every message which you receive from the List. All you have to do is send a message to: [EMAIL PROTECTED] (or [EMAIL PROTECTED] if you are on the digest list) To prevent malicious individuals from unsubscribing other people, the mailing list software insists that the message requesting that an email address be unsubscribed comes from that same address. If your email address has changed you can still unsubscribe, but you will need to read the help document, which can be recieved by sending an empty email to: [EMAIL PROTECTED] =head1 To post to the List I<Posting> to the list is just sending a message to the address which you will be given after you subscribe. Your message will not be accepted unless you have first L<subscribed|To Get On The List>. Do not post to [EMAIL PROTECTED], except to subscribe to the list! Please do not post to the list itself to attempt to unsubscribe from it. =head2 Private Mail Please do not send private mail to list members unless it is invited. Even if they have answered your question on the list, you should continue the discussion on the list. On the other hand, if someone replies to you personally, you shouldn't forward the reply to the list unless you have received permission from this person. =head2 Other Tips =head3 Read The Documentation Please read as much of the documentation as you can before posting. Please also try to see if your question has been asked recently, there are links to searchable archives of the list on the mod_perl home page http://perl.apache.org/. =head3 Give Full Information Don't forget that the people reading the list have no idea even what operating system your computer runs unless you tell them. When reporting problems include at least the information requested in the document entitled I<SUPPORT> which you will find in the mod_perl source distribution. You can find many excellent examples of posts with good supporting information by looking at the mod_perl mailing list archives. There are URLs for several archives (with several different search engines) on the mod_perl home page. Followup posts will show you how easy the writer made it for the person who replied to deduce the problem and to suggest a way of solving it, or to find some further item information. If after reading the I<SUPPORT> document you think that more information is needed for your particular problem, but you still don't know what information to give, ask on the list rather than sending long scripts and configuration files which few people will have the time to read. =head3 Error Messages If you include error messages in your post, make sure that they are EXACTLY the messages which you saw. Use a text editor to transfer them directly into your message if you can. Try not to say things like "the computer said something about not recognizing a command" but instead to say something like this: "When logged in as root I typed the command: httpd -X at the console and on the console I saw the message Syntax error on line 393 of /etc/httpd/conf/httpd.conf: Invalid command 'PerlHandler', perhaps mis-spelled or defined by a module not included in the server configuration [FAILED]" =head3 The Subject Line The I<Subject:> line is B<very> important. Choose an B<informative> I<Subject> line for the mail header. Busy list members will skip messages with unclear I<Subject> lines. =head3 Preserve The Threads Messages which all have the same I<Subject> line text (possibly preceded by the word "Re:" which is automatically added by your mailer) are together known as a "thread". List members and mail archive use mail unique-ids and/or the Subject line to sort mail. Do not change the text without a very good reason, because this may break the thread. Breaking the thread makes it difficult to follow the discussion and can be very confusing. It may be better to start a new thread than to continue an old one if you change the theme. =head3 Post in PLAIN TEXT Do not post in HTML. Microsoft users in particular should take careful note of this. Use either the US-ASCII or ISO-8859-1 (Latin-1) character set, do not use other character sets which may be designed for those who do not speak English and which may not be displayable on many terminals. If you ignore this advice then the chances are greater that your message will not be read. =head3 Time and Bandwidth Remember that thousands of people may read your messages. To save time and to keep badwidth usage to a minimum, please keep posts reasonably short, but please make it clear precisely what you are asking. If you can, send a *small* example of a script or configuration which reproduces your problem. Please do not send long scripts which cannot easily be understood. Please do not send large attachments of many kilobytes, if they are needed then put them on the Web somewhere or say in your message that you can send them separately if they are requested. =head3 Tags It can be helpful if you use a C<[tag]> in square brackets in the I<Subject:> line, as well as the brief description of your post. Some suggested tags are: ADMIN Stuff about running the List. ADVOCACY Promoting the use of mod_perl, printing T-shirts, stuff like that. Please don't start another discussion about whether we should put this on a different list, we've been there before. ANNOUNCE Announcements of new software tools, packages and updates. BENCHMARK Apache/mod_perl performance issues. BUG Report of possible fault in mod_perl or associated software - it's better if you can send a patch instead! DBI Stuff generally concerning Apache/mod_perl interaction with databases. FYI For information only. JOB Any post about mod_perl jobs is welcome as long as it is brief and to the point. Note: Not "JOBS". MASON Jonathan Swartz' implementation of Perl embedded in HTML. NEWS Items of news likely to be interesting to mod_perlers. OT Off-topic items, please try to keep traffic low. PATCH Suggested fix for fault in mod_perl or associated software. QUESTION Questions about mod_perl which is not covered by one of the more specific headings. RareModules Occasional reminders about little-used modules on CPAN. RFC Requests for comment from the mod_perl community. SITE Things about running the Apache/mod_perl servers. SUMMARY After investigation and perhaps fixing a fault, and after an extended discussion of a specific topic, it is helpful if someone summarizes the thread. Don't be shy, everyone will appreciate the effort. If you can't find a tag which fits your subject, don't worry. If you have a very specific subject to discuss, feel free to choose your own tag, for example C<[mod_proxy]> or C<[Perl Sections]> but remember that the main reasons for the I<Subject> line are to save people time and to improve the response to your posts. It does not matter whether you use C<[UPPER CASE]> or C<[lower case]> or even a C<[Mixture Of Both]> in the tag. Try to keep the tag short. The tag should be the first thing in the I<Subject> line. =head3 If You Don't Get a Reply Sometimes you will not get a reply. Try to be patient, but it is OK to try again after a few days. Sometimes the replies you get will be very short. Please do not worry about that. People are very busy, that's all. Of course if your post is C<[OT]> for the list then you may not get a reply, or you may get one telling you to try a different forum. =head3 If You Don't Understand a Reply Just say so. =head3 General Perl and Apache questions The mod_perl list is NOT for general questions about Apache and the Perl language. The majority view is tolerant of off-topic posts, but it is considered impolite to post general Perl and Apache questions on the mod_perl list. The best you can hope for is a private reply and a polite reminder that the question is off-topic for this list. If you catch someone on a bad day, you might not get the best. There are often bad days in software development departments... If the Perl and Apache documentation has not answered your question then you could try looking at http://lists.perl.org/ or one of the comp.lang.* newsgroups. From time to time there are efforts to start a dedicated Perl mailing list and these usually result in a message or two on the mod_perl list, so it might be worth your while to search the archives. Please note that there are now separate mailing lists for ASP, EmbPerl and Mason, but although we keep trying to get a separate list off the ground for I<Advocacy> it always seems to end up back on the mod_perl list. =head1 Replying to posts =head2 The "Subject:" line Make sure that you include the exact I<Subject:> from the original post, unmodified. This makes it much easier for people (and for the mail software) to deal with the mail. If you must change the subject line then please append the words "was originally" plus the original subject line to your new subject line so that folks can see what is going on. =head2 Extracts From Other Posts When replying to a post, please include B<short> excerpts from the post to which you are replying so that others can follow the conversation without having to wade through reams of superfluous text. If you are lazy about this then messages can get very long indeed and can become a burden to the other people who might be trying to help. Make sure that there is a clear distinction between the text(s) of the message(s) to which you are replying and your reply itself. =head2 Unnecessary Duplication If you know that the intended recipients are subscribed to the List, there is no need to send messages both to them and to the list. They will get more than one copy of the message which is wasteful. =head2 Private replies It is helpful to keep most of your replies on the list, so that others see that help is being given and so they do not waste time on problems which have already been solved. Where it is appropriate to take a discussion off the list (for example where it veers off-topic, as often happens), say so in a message so that everyone is aware of it. =head2 Flames The readers of the mod_perl List aren't interested in that kind of thing. Don't get involved. =head1 The mod_perl Guide You absolutely *must* read the mod_perl Guide. It is a large document, you probably will want to download it and read it off-line. If you get the source (see below, L<Corrections and Contributions>) it comes with a build file to turn the .pod (Plain Old Documentation) source into HTML, .ps (PostScript) and .pdf (Portable Document Format). You will need at least Perl version 5.005 to build it. If you browse the Guide on-line you can use one of the search engines to find things in it. If you build and browse your own local HTML copy of the Guide, some of the links in it will not work unless you are connected to the Internet. Some people prefer to work offline, using tools like `grep' or `mc' to search the .pod source directly. =head2 Finding the Guide The URL of the Guide is: http://perl.apache.org/guide/ The sources are available from CPAN and other mirrors: http://www.cpan.org/authors/id/S/ST/STAS/ =head2 Corrections And Contributions Corrections and additions to the Guide are welcome. The original is kept in .pod format, and it is converted to other formats by Perl code. The Guide changes rather frequently (the CVS snapshot is updated every six hours!) so if you want to make a contribution make sure that you get the latest version of the Guide source from http://stason.org/guide-snapshots and make your changes to the .pod source only. In the first instance, post your changes to the mod_perl List for comment. =begin html <br><hr><br><!-- 11 Jun 2000 Initial publication for comment 18 Dec 2000 Minor corrections and additions 21 Oct 2001 Minor corrections, converted to .POD format --> =end html email-etiquette: This version dated 17 October 2001. 1.1 modperl-docs/src/docs/1.0/faqs/mjtg-news.txt Index: mjtg-news.txt =================================================================== From: [EMAIL PROTECTED] (M.J.T. Guy) Newsgroups: comp.lang.perl.misc Subject: Re: Lexical scope and embedded subroutines. Date: 6 Jan 1998 18:22:39 GMT Organization: University of Cambridge, England Lines: 95 Message-ID: <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> NNTP-Posting-Host: taurus.cus.cam.ac.uk In article <[EMAIL PROTECTED]>, Aaron Harsh <[EMAIL PROTECTED]> wrote: > >Before I read this thread (and perlsub to get the details) I would have >assumed the original code was fine. > >This behavior brings up the following questions: > o Is Perl's behavior some sort of speed optimization? No, but see below. > o Did the Perl gods just decide that scheme-like behavior was less >important than the pseduo-static variables described in perlsub? This subject has been kicked about at some length on perl5-porters. The current behaviour was chosen as the best of a bad job. In the context of Perl, it's not obvious what "scheme-like behavior" means. So it isn't an option. See below for details. > o Does anyone else find Perl's behavior counter-intuitive? *Everyone* finds it counterintuitive. The fact that it only generates a warning rather than a hard error is part of the Perl Gods policy of hurling thunderbolts at those so irreverent as not to use -w. > o Did programming in scheme destroy my ability to judge a decent language >feature? You're still interested in Perl, so it can't have rotted your brain completely. > o Have I misremembered how scheme handles these situations? Probably not. > o Do Perl programmers really care how much Perl acts like scheme? Some do. > o Should I have stopped this message two or three questions ago? Yes. The problem to be solved can be stated as "When a subroutine refers to a variable which is instantiated more than once (i.e. the variable is declared in a for loop, or in a subroutine), which instance of that variable should be used?" The basic problem is that Perl isn't Scheme (or Pascal or any of the other comparators that have been used). In almost all lexically scoped languages (i.e. those in the Algol60 tradition), named subroutines are also lexically scoped. So the scope of the subroutine is necessarily contained in the scope of any external variable referred to inside the subroutine. So there's an obvious answer to the "which instance?" problem. But in Perl, named subroutines are globally scoped. (But in some future Perl, you'll be able to write my sub lex { ... } to get lexical scoping.) So the solution adopted by other languages can't be used. The next suggestion most people come up with is "Why not use the most recently instantiated variable?". This Does The Right Thing in many cases, but fails when recursion or other complications are involved. Consider sub outer { inner(); outer(); my $trouble; inner(); sub inner { $trouble }; outer(); inner(); } Which instance of $trouble is to be used for each call of inner()? And why? The consensus was that an incomplete solution was unacceptable, so the simple rule "Use the first instance" was adopted instead. And it is more efficient than possible alternative rules. But that's not why it was done. Mike Guy 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl.pod Index: mod_perl.pod =================================================================== =head1 NAME mod_perl - Embed a Perl interpreter in the Apache HTTP server =head1 DESCRIPTION The Apache/Perl integration project brings together the full power of the Perl programming language and the Apache HTTP server. This is achieved by linking the Perl runtime library into the server and providing an object oriented Perl interface to the server's C language API. These pieces are seamlessly glued together by the `mod_perl' server plugin, making it is possible to write Apache modules entirely in Perl. In addition, the persistent interpreter embedded in the server avoids the overhead of starting an external interpreter and the penalty of Perl start-up (compile) time. Without question, the most popular Apache/Perl module is Apache::Registry module. This module emulates the CGI environment, allowing programmers to write scripts that run under CGI or mod_perl without change. Existing CGI scripts may require some changes, simply because a CGI script has a very short lifetime of one HTTP request, allowing you to get away with "quick and dirty" scripting. Using mod_perl and Apache::Registry requires you to be more careful, but it also gives new meaning to the work "quick"! Apache::Registry maintains a cache of compiled scripts, which happens the first time a script is accessed by a child server or once again if the file is updated on disk. Although it may be all you need, a speedy CGI replacement is only a small part of this project. Callback hooks are in place for each stage of a request. Apache-Perl modules may step in during the handler, header parser, uri translate, authentication, authorization, access, type check, fixup and logger stages of a request. =head1 FAQ The mod_perl FAQ is maintained by Frank Cringle E<lt>[EMAIL PROTECTED]<gt>: http://perl.apache.org/faq/ =head1 Apache/Perl API See 'perldoc Apache' for info on how to use the Perl-Apache API. See the lib/ directory for example modules and L<apache-modlist.html> for a comprehensive list. See the eg/ directory for example scripts. =head1 mod_perl For using mod_perl as a CGI replacement see the L<cgi_to_mod_perl> document. You may load modules at server startup via: PerlModule Apache::SSI SomeOther::Module Optionally: PerlRequire perl-scripts/script_to_load_at_startup.pl A B<PerlRequire> file is commonly used for intialization during server startup time. A PerlRequire file name can be absolute or relative to B<ServerRoot> or a path in C<@INC>. A B<PerlRequire>'d file must return a true value, i.e., the end of this file should have a: 1; #return true value See eg/startup.pl for an example to start with. In an httpd.conf E<lt>Location /fooE<gt> or .htaccess you need: PerlHandler sub_routine_name This is the name of the subroutine to call to handle each request. e.g. in the PerlModule Apache::Registry this is "Apache::Registry::handler". If PerlHandler is not a defined subroutine, mod_perl assumes it is a package name which defines a subroutine named "handler". PerlHandler Apache::Registry Would load Registry.pm (if it is not already) and call it's subroutine "handler". There are several stages of a request where the Apache API allows a module to step in and do something. The Apache documentation will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling these hooks. The following configuration directives take one argument, which is the name of the subroutine to call. If the value is not a subroutine name, mod_perl assumes it is a package name which implements a 'handler' subroutine. PerlChildInitHandler (requires apache_1.3.0 or higher) PerlPostReadRequestHandler (requires apache_1.3.0 or higher) PerlInitHandler PerlTransHandler PerlHeaderParserHandler PerlAccessHandler PerlAuthenHandler PerlAuthzHandler PerlTypeHandler PerlFixupHandler PerlHandler PerlLogHandler PerlCleanupHandler PerlChildExitHandler (requires apache_1.3.0 or higher) Only ChildInit, ChildExit, PostReadRequest and Trans handlers are not allowed in .htaccess files. Modules can check if the code is being run in the parent server during startup by checking the $Apache::Server::Starting variable. =head1 RESTARTING =over 4 =item PerlFreshRestart By default, if a server is restarted (ala kill -USR1 `cat logs/httpd.pid`), Perl scripts and modules are not reloaded. To reload B<PerlRequire>'s, B<PerlModule>'s, other use()'d modules and flush the B<Apache::Registry> cache, enable with this command: PerlFreshRestart On =item PERL_DESTRUCT_LEVEL With Apache versions 1.3.0 and higher, mod_perl will call the perl_destruct() Perl API function during the child exit phase. This will cause proper execution of B<END> blocks found during server startup along with invoking the B<DESTROY> method on global objects who are still alive. It is possible that this operation may take a long time to finish, causing problems during a restart. If your code does not contain and B<END> blocks or B<DESTROY> methods which need to be run during child server shutdown, this destruction can be avoided by setting the I<PERL_DESTRUCT_LEVEL> environment variable to C<-1>. =back =head1 ENVIRONMENT Under CGI the Perl hash C<%ENV> is magical in that it inherits environment variables from the parent process and will set them should a process spawn a child. However, with mod_perl we're in the parent process that would normally setup the common environment variables before spawning a CGI process. Therefore, mod_perl must feed these variables to C<%ENV> directly. Normally, this does not happen until the response stage of a request when C<PerlHandler> is called. If you wish to set variables that will be available before then, such as for a C<PerlAuthenHandler>, you may use the C<PerlSetEnv> configuration directive: PerlSetEnv SomeKey SomeValue You may also use the C<PerlPassEnv> directive to pass an already existing environment variable to Perl's C<%ENV>: PerlPassEnv SomeKey =over 4 =item CONFIGURATION The C<PerlSetVar> and C<PerlAddVar> directives provide a simple mechanism for passing information from configuration files to Perl modules or Registry scripts. The C<PerlSetVar> directive allows you to set a key/value pair. PerlSetVar SomeKey SomeValue Perl modules or scripts retrieve configuration values using the C<$r-E<gt>dir_config> method. $SomeValue = $r->dir_config('SomeKey'); The C<PerlAddVar> directive allows you to emulate Perl arrays: PerlAddVar SomeKey FirstValue PerlAddVar SomeKey SecondValue ... ... ... PerlAddVar SomeKey Nth-Value In the Perl modules the values are extracted using the C<$r-E<gt>dir_config-E<gt>get> method. @array = $r->dir_config->get('SomeKey'); Alternatively in your code you can extend the setting with: $r->dir_config->add(SomeKey => 'Bar'); C<PerlSetVar> and C<PerlAddVar> handle keys case-insensitively. =item GATEWAY_INTERFACE The standard CGI environment variable B<GATEWAY_INTERFACE> is set to C<CGI-Perl/1.1> when running under mod_perl. =item MOD_PERL The environment variable `MOD_PERL' is set so scripts can say: if(exists $ENV{MOD_PERL}) { #we're running under mod_perl ... } else { #we're NOT running under mod_perl } =back =head1 BEGIN blocks Perl executes C<BEGIN> blocks during the compile time of code as soon as possible. The same is true under mod_perl. However, since mod_perl normally only compiles scripts and modules once, in the parent server or once per-child, C<BEGIN> blocks in that code will only be run once. As L<perlmod> explains, once a C<BEGIN> has run, it is immediately undefined. In the mod_perl environment, this means C<BEGIN> blocks will not be run during each incoming request unless that request happens to be one that is compiling the code. Modules and files pulled in via require/use which contain C<BEGIN> blocks will be executed: - only once, if pulled in by the parent process - once per-child process if not pulled in by the parent process - an additional time, once per-child process if the module is pulled in off of disk again via Apache::StatINC - an additional time, in the parent process on each restart if PerlFreshRestart is On - unpredictable if you fiddle with C<%INC> yourself B<Apache::Registry> scripts which contain C<BEGIN> blocks will be executed: - only once, if pulled in by the parent process via B<Apache::RegistryLoader> - once per-child process if not pulled in by the parent process - an additional time, once per-child process if the script file has changed on disk - an additional time, in the parent process on each restart if pulled in by the parent process via B<Apache::RegistryLoader> and PerlFreshRestart is On =head1 END blocks As L<perlmod> explains, an C<END> subroutine is executed as late as possible, that is, when the interpreter is being exited. In the mod_perl environment, the interpreter does not exit until the server is shutdown. However, mod_perl does make a special case for B<Apache::Registry> scripts. Normally, C<END> blocks are executed by Perl during it's C<perl_run()> function, which is called once each time the Perl program is executed, e.g. once per (mod_cgi) CGI scripts. However, mod_perl only calls C<perl_run()> once, during server startup. Any C<END> blocks encountered during main server startup, i.e. those pulled in by the B<PerlRequire> or by any B<PerlModule> are suspended and run at server shutdown, aka C<child_exit> (requires apache 1.3.0+). Any C<END> blocks that are encountered during compilation of Apache::Registry scripts are called after the script done is running, including subsequent invocations when the script is cached in memory. All other C<END> blocks encountered during other Perl*Handler callbacks, e.g. B<PerlChildInitHandler>, will be suspended while the process is running and called during C<child_exit> when the process is shutting down. Module authors may be wish to use C<$r-E<gt>register_cleanup> as an alternative to C<END> blocks if this behavior is not desirable. =head1 MEMORY CONSUMPTION Don't be alarmed by the size of your httpd after you've linked with mod_perl. No matter what, your httpd will be larger than normal to start, simply because you've linked with perl's runtime. Here's I'm just running % /usr/bin/perl -e '1 while 1' PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 10214 dougm 67 0 668K 212K run 0:04 71.55% 21.13% perl Now with a few random modules: % /usr/bin/perl -MDBI -MDBD::mSQL -MLWP::UserAgent -MFileHandle -MIO -MPOSIX -e '1 while 1' 10545 dougm 49 0 3732K 3340K run 0:05 54.59% 21.48% perl Here's my httpd linked with libperl.a, not having served a single request: 10386 dougm 5 0 1032K 324K sleep 0:00 0.12% 0.11% httpd-a You can reduce this if you configure perl 5.004+ with -Duseshrplib. Here's my httpd linked with libperl.sl, not having served a single request: 10393 dougm 5 0 476K 368K sleep 0:00 0.12% 0.10% httpd-s Now, once the server starts receiving requests, the embedded interpreter will compile code for each 'require' file it has not seen yet, each new Apache::Registry subroutine that's compiled, along with whatever modules it's use'ing or require'ing. Not to mention AUTOLOADing. (Modules that you 'use' will be compiled when the server starts unless they are inside an eval block.) httpd will grow just as big as our /usr/bin/perl would, or a CGI process for that matter, it all depends on your setup. The L<mod_perl_tuning> document gives advice on how to best setup your mod_perl server environment. The mod_perl INSTALL document explains how to build the Apache:: extensions as shared libraries (with 'perl Makefile.PL DYNAMIC=1'). This may save you some memory, however, it doesn't work on a few systems such as aix and unixware. However, on most systems, this strategy will only make the httpd I<look> smaller. When in fact, an httpd with Perl linked static with take up less real memory and preform faster than shared libraries at the same time. See the L<mod_perl_tuning> document for details. =head2 MEMORY TIPS =over 4 =item Leaks If you are using a module that leaks or have code of their own that leaks, in any case using the apache configuration directive 'MaxRequestsPerChild' is your best bet to keep the size down. =item Perl Options Newer Perl versions also have other options to reduce runtime memory consumption. See Perl's INSTALL file for details on C<-DPACK_MALLOC> and C<-DTWO_POT_OPTIMIZE>. With these options, my httpd shrinks down ~150K. =item Server Startup Use the B<PerlRequire> and B<PerlModule> directives to load commonly used modules such as CGI.pm, DBI, etc., when the server is started. On most systems, server children will be able to share this space. =item Importing Functions When possible, avoid importing of a module functions into your namespace. The aliases which are created can take up quite a bit of space. Try to use method interfaces and fully qualified Package::function names instead. Here's a freshly started httpd who's served one request for a script using the CGI.pm method interface: TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND p4 5016 dougm 154 20 3808K 2636K sleep 0:01 9.62 4.07 httpd Here's a freshly started httpd who's served one request for the same script using the CGI.pm function interface: TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND p4 5036 dougm 154 20 3900K 2708K sleep 0:01 3.19 2.18 httpd Now do the math: take that difference, figure in how many other scripts import the same functions and how many children you have running. It adds up! =item Global Variables It's always a good idea to stay away from global variables when possible. Some variables must be global so Perl can see them, such as a module's B<@ISA> or B<$VERSION> variables. In common practice, a combination of C<use strict> and C<use vars> keeps modules clean and reduces a bit of noise. However, B<use vars> also creates aliases as the B<Exporter> does, which eat up more space. When possible, try to use fully qualified names instead of B<use vars>. Example: package MyPackage; use strict; @MyPackage::ISA = qw(...); $MyPackage::VERSION = "1.00"; vs. package MyPackage; use strict; use vars qw(@ISA $VERSION); @ISA = qw(...); $VERSION = "1.00"; =item Further Reading In case I forgot to mention, read Vivek Khera's L<mod_perl_tuning> document for more tips on improving Apache/mod_perl performance. =back =head1 SWITCHES Normally when you run perl from the command line or have the shell invoke it with `#!', you may choose to pass perl switch arguments such as C<-w> or C<-T>. Since the command line is only parsed once, when the server starts, these switches are unavailable to mod_perl scripts. However, most command line arguments have a perl special variable equivilant. For example, the C<$^W> variable coresponds to the C<-w> switch. Consult L<perlvar> for more details. With mod_perl it is also possible to turn on warnings globaly via the B<PerlWarn> directive: PerlWarn On The switch which enables taint checks does not have a special variable, so mod_perl provides the B<PerlTaintCheck> directive to turn on taint checks. In httpd.conf, enable with: PerlTaintCheck On Now, any and all code compiled inside httpd will be checked. The environment variable B<PERL5OPT> can be used to set additional perl startup flags such as B<-d> and B<-D>. See L<perlrun>. =head1 PERSISTENT DATABASE CONNECTIONS Another popular use of mod_perl is to take advantage of it's persistance to maintain open database connections. The basic idea goes like so: #Apache::Registry script use strict; use vars qw($dbh); $dbh ||= SomeDbPackage->connect(...); Since C<$dbh> is a global variable, it will not go out of scope, keeping the connection open for the lifetime of a server process, establishing it during the script's first request for that process. It's recommended that you use one of the Apache::* database connection wrappers. Currently for DBI users there is C<Apache::DBI> and for Sybase users C<Apache::Sybase::DBlib>. These modules hide the peculiar code example above. In addition, different scripts may share a connection, minimizing resource consumption. Example: #httpd.conf has # PerlModule Apache::DBI #DBI scripts look exactly as they do under CGI use strict; my $dbh = DBI->connect(...); Although B<$dbh> shown here will go out of scope when the script ends, the Apache::DBI module's reference to it does not, keep the connection open. B<WARNING:> Do not attempt to open a persistent database connection in the parent process (via PerlRequire or PerlModule). If you do, children will get a copy of this handle, causing clashes when the handle is used by two processes at the same time. Each child must have it's own unique connection handle. =head1 STACKED HANDLERS With the mod_perl stacked handlers mechanism, it is possible for more than one Perl*Handler to be defined and run during each stage of a request. Perl*Handler directives can define any number of subroutines, e.g. (in config files) PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans With the method, Apache-E<gt>push_handlers, callbacks can be added to the stack by scripts at runtime by mod_perl scripts. Apache-E<gt>push_handlers takes the callback hook name as it's first argument and a subroutine name or reference as it's second. e.g.: Apache->push_handlers("PerlLogHandler", \&first_one); $r->push_handlers("PerlLogHandler", sub { print STDERR "__ANON__ called\n"; return 0; }); After each request, this stack is cleared out. All handlers will be called unless a handler returns a status other than OK or DECLINED, this needs to be considered more. Post apache-1.2 will have a DONE return code to signal termiation of a stage, which Rob and I came up with while back when first discussing the idea of stacked handlers. 2.0 won't come for quite sometime, so mod_perl will most likely handle this before then. example uses: CGI.pm maintains a global object for it's plain function interface. Since the object is global, it does not go out of scope, DESTROY is never called. CGI-E<gt>new can call: Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals); This function will be called during the final stage of a request, refreshing CGI.pm's globals before the next request comes in. Apache::DCELogin establishes a DCE login context which must exist for the lifetime of a request, so the DCE::Login object is stored in a global variable. Without stacked handlers, users must set PerlCleanupHandler Apache::DCELogin::purge in the configuration files to destroy the context. This is not "user-friendly". Now, Apache::DCELogin::handler can call: Apache->push_handlers("PerlCleanupHandler", \&purge); Persistent database connection modules such as Apache::DBI could push a PerlCleanupHandler handler that iterates over %Connected, refreshing connections or just checking that ones have not gone stale. Remember, by the time we get to PerlCleanupHandler, the client has what it wants and has gone away, we can spend as much time as we want here without slowing down response time to the client. PerlTransHandlers may decide, based or uri or other condition, whether or not to handle a request, e.g. Apache::MsqlProxy. Without stacked handlers, users must configure: PerlTransHandler Apache::MsqlProxy::translate PerlHandler Apache::MsqlProxy PerlHandler is never actually invoked unless translate() sees the request is a proxy request ($r-E<gt>proxyreq), if it is a proxy request, translate() set $r-E<gt>handler("perl-script"), only then will PerlHandler handle the request. Now, users do not have to specify 'PerlHandler Apache::MsqlProxy', the translate() function can set it with push_handlers(). Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!): PerlHandler My::Header Some::Body A::Footer This was my first test: #My.pm package My; sub header { my $r = shift; $r->content_type("text/plain"); $r->send_http_header; $r->print("header text\n"); } sub body { shift->print("body text\n") } sub footer { shift->print("footer text\n") } 1; __END__ #in config <Location /foo> SetHandler "perl-script" PerlHandler My::header My::body My::footer </Location> Parsing the output of another PerlHandler? this is a little more tricky, but consider: <Location /foo> SetHandler "perl-script" PerlHandler OutputParser SomeApp </Location> <Location /bar> SetHandler "perl-script" PerlHandler OutputParser AnotherApp </Location> Now, OutputParser goes first, but it untie's *STDOUT and re-tie's to it's own package like so: package OutputParser; sub handler { my $r = shift; untie *STDOUT; tie *STDOUT => 'OutputParser', $r; } sub TIEHANDLE { my($class, $r) = @_; bless { r => $r}, $class; } sub PRINT { my $self = shift; for (@_) { #do whatever you want to $_ $self->{r}->print($_ . "[insert stuff]"); } } 1; __END__ To build in this feature, configure with: % perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc] Another method 'Apache-E<gt>can_stack_handlers' will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1, FALSE otherwise. =head1 PERL METHOD HANDLERS See L<mod_perl_method_handlers>. =head1 PERL SECTIONS With E<lt>PerlE<gt>E<lt>/PerlE<gt> sections, it is possible to configure your server entirely in Perl. E<lt>PerlE<gt> sections can contain *any* and as much Perl code as you wish. These sections are compiled into a special package who's symbol table mod_perl can then walk and grind the names and values of Perl variables/structures through the Apache core config gears. Most of the configurations directives can be represented as C<$Scalars> or C<@Lists>. A C<@List> inside these sections is simply converted into a single-space delimited string for you inside. Here's an example: #httpd.conf <Perl> @PerlModule = qw(Mail::Send Devel::Peek); #run the server as whoever starts it $User = getpwuid($>) || $>; $Group = getgrgid($)) || $); $ServerAdmin = $User; </Perl> Block sections such as E<lt>LocationE<gt>E<lt>/LocationE<gt> are represented in a C<%Hash>, e.g.: $Location{"/~dougm/"} = { AuthUserFile => '/tmp/htpasswd', AuthType => 'Basic', AuthName => 'test', DirectoryIndex => [qw(index.html index.htm)], Limit => { METHODS => 'GET POST', require => 'user dougm', }, }; #If a Directive can take say, two *or* three arguments #you may push strings and the lowest number of arguments #will be shifted off the @List #or use array reference to handle any number greater than #the minimum for that directive push @Redirect, "/foo", "http://www.foo.com/"; push @Redirect, "/imdb", "http://www.imdb.com/"; push @Redirect, [qw(temp "/here" "http://www.there.com")]; Other section counterparts include C<%VirtualHost>, C<%Directory> and C<%Files>. These are somewhat boring examples, but they should give you the basic idea. You can mix in any Perl code your heart desires. See eg/httpd.conf.pl and eg/perl_sections.txt for some examples. A tip for syntax checking outside of httpd: <Perl> #!perl #... code here ... __END__ </Perl> Now you may run C<perl -cx httpd.conf>. It may be the case that E<lt>PerlE<gt> sections are not completed or an oversight was made in an certain area. If they do not behave as you expect, please send a report to the modperl mailing list. To configure this feature build with 'perl Makefile.PL PERL_SECTIONS=1' =head1 mod_perl and mod_include integration As of apache 1.2.0, mod_include can handle Perl callbacks. A `sub' key value may be anything a Perl*Handler can be: subroutine name, package name (defaults to package::handler), Class-E<gt>method call or anonymous sub {} Example: Child <!--#perl sub="sub {print $$}" --> accessed <!--#perl sub="sub {print ++$Access::Cnt }" --> times. <br> <!--#perl sub="Package::handler" arg="one" arg="two" --> #don't forget to escape double quotes! Perl is <!--#perl sub="sub {for (0..10) {print \"very \"}}"--> fun to use! The B<Apache::Include> module makes it simple to include B<Apache::Registry> scripts with the mod_include perl directive. Example: <!--#perl sub="Apache::Include" arg="/perl/ssi.pl" --> You can also use 'virtual include' to include Apache::Registry scripts of course. However, using #perl will save the overhead of making Apache go through the motions of creating/destroying a subrequest and making all the necessary access checks to see that the request would be allowed outside of a 'virtual include' context. To enable perl in mod_include parsed files, when building apache the following must be present in the Configuration file: EXTRA_CFLAGS=-DUSE_PERL_SSI -I. `perl -MExtUtils::Embed -ccopts` mod_perl's Makefile.PL script can take care of this for you as well: perl Makefile.PL PERL_SSI=1 If you're interested in sprinkling Perl code inside your HTML documents, you'll also want to look at the Apache::Embperl (http://perl.apache.org/embperl/), Apache::ePerl and Apache::SSI modules. =head1 DEBUGGING =over 4 =item MOD_PERL_TRACE To enable mod_perl debug tracing configure mod_perl with the PERL_TRACE option: perl Makefile.PL PERL_TRACE=1 The trace levels can then be enabled via the B<MOD_PERL_TRACE> environment variable which can contain any combination of: d - Trace directive handling during configuration read s - Trace processing of perl sections h - Trace Perl*Handler callbacks g - Trace global variable handling, intepreter construction, END blocks, etc. all - all of the above =item spinning httpds To see where an httpd is "spinning", try adding this to your script or a startup file: use Carp (); $SIG{'USR1'} = sub { Carp::confess("caught SIGUSR1!"); }; Then issue the command line: kill -USR1 <spinning_httpd_pid> =back =head1 PROFILING It is possible to profile code run under mod_perl with the B<Devel::DProf> module available on CPAN. However, you must have apache version 1.3.0 or higher and the C<PerlChildExitHandler> enabled. When the server is started, B<Devel::DProf> installs an C<END> block to write the I<tmon.out> file, which will be run when the server is shutdown. Here's how to start and stop a server with the profiler enabled: % setenv PERL5OPT -d:DProf % httpd -X -d `pwd` & ... make some requests to the server here ... % kill `cat logs/httpd.pid` % unsetenv PERL5OPT % dprofpp See also: B<Apache::DProf> =head1 BENCHMARKING How much faster is mod_perl that CGI? There are many ways to benchmark the two, see the C<benchmark/> directory for some examples. See also: B<Apache::Timeit> =head1 WARNINGS See L<mod_perl_traps>. =head1 SUPPORT See the L<SUPPORT> file. =head1 Win32 See L<INSTALL.win32> for building from sources. Info about win32 binary distributions of mod_perl are available from: http://perl.apache.org/distributions/ =head1 REVISION $Id: mod_perl.pod,v 1.1 2002/01/05 19:20:01 stas Exp $ =head1 AUTHOR Doug MacEachern 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_api.pod Index: mod_perl_api.pod =================================================================== =head1 NAME Mod_perl_api - accessing the Apache API via mod_perl =head1 DESCRIPTION This part of the mod_perl FAQ deals with the Apache Application Programmer's Interface and how to access it from perl via mod_perl. =head1 Why can't the server find the handler I wrote? =head2 Did you enable the required hook? As described in the mod_perl/INSTALL document, the only callback hook enabled by default is PerlHandler. If you want to intervene at a different stage of request processing you must enable the relevant hook. So to add a special authentication handler, for instance, you would start the installation process with: perl Makefile.PL PERL_AUTHEN=1 =head2 Is the handler correctly referenced in the configuration? Apache must be told to load your handler, either as a module with the C<PerlModule> directive or as a script with C<PerlRequire>. The handler subroutine will then be available, but you must also specify which requests it should process. This is done by naming it in one of the Perl*Handler directives (PerlInitHandler, PerlTransHandler, etc.). If this directive is put in access.conf outside of any restrictive context, your handler will be called during the given phase of each request processed by the server. You can make it more selective by restricting it to a directory (-hierarchy) in a E<lt>Directory ...E<gt> section of access.conf or by putting it in a .htaccess file. Here is an example of the directives needed to call a handler during Apache's URI to filename translation phase: PerlRequire /full/path/to/script/Trans.pl PerlTransHandler Trans::handler Trans.pl would start with the statement C<Package Trans;> and define a subroutine called C<handler>. =head1 Where can I find examples to get me started? Check out the Apache-Perl-contrib tarfile at http://perl.apache.org/src/ Here is an example from Vivek Khera. It allows you to filter files through a perl script based on their location. Rather than having to invoke a CGI script, the user just references the file with a normal URL and it is automagically processed by this code... #! /usr/local/bin/perl use strict; # filter a file before returning it to the web client # tell Apache to use the PerlHandler FileFilter on file which need # filtering in the htaccess file: # # <Files *.baz> # SetHandler perl-script # PerlHandler FileFilter # </Files> package FileFilter; use Apache::Constants ':common'; # find out the file name, then write it out with our header attached sub handler { my $r = shift; my $fileName = $r->filename; open(F,$fileName) or return NOT_FOUND; # file not found $r->content_type('text/html'); $r->no_cache(1); # don't be caching my dynamic documents! $r->send_http_header; $r->print("<HEAD><TITLE>This is my personal header!</TITLE></HEAD><BODY>"); # Now copy the file to the client. If you do not need to make any # changes you can copy it verbatim with the single statement # $r->send_fd(\*F); # Otherwise, loop over each line... while(<F>) { # mangle the contents here if you want $r->print ($_); } close(F); $r->print("<HR>Document created: ", scalar localtime time); $r->print("</BODY>"); OK; } 1; =head1 How can I check if mod_perl is available during configuration? Ralf Engelschall writes: When you compiled one httpd with and the other without mod_perl, then you can simply use E<lt>IfModule mod_perl.cE<gt>...E<lt>/IfModuleE<gt> to surround the stuff for the httpd compiled with mod_perl. The other then ignores these lines. Example: <IfModule mod_perl.c> ...stuff for httpd w/ mod_perl... </IfModule> <IfModule !mod_perl.c> ...stuff for httpd w/o mod_perl... </IfModule> =cut 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_cgi.pod Index: mod_perl_cgi.pod =================================================================== =head1 NAME Mod_perl_cgi - running CGI scripts under mod_perl ($Date: 2002/01/05 19:20:01 $) =head1 DESCRIPTION This part of the mod_perl FAQ deals with questions surrounding CGI scripts. =head1 Why doesn't my CGI script work at all under mod_perl? What are the symptoms? Here are some possibilities. =head2 File not found Have you made the correct entries in Apache's configuration files? You need to add the C<Alias /perl/ ...> and C<E<lt>Location /perlE<gt>...> directives to access.conf as described in mod_perl.pod. And of course the script must be in the directory specified by the Alias directive and it must be readable and executable by the user that the web server runs as. =head2 Forbidden You don't have permission to access /perl/foo on this server. chmod 755 /path/to/my/mod_perl/scripts chmod 755 /path/to/my/mod_perl/scripts/foo =head2 Internal Server Error The script died with an execution error. There should be an error message in the server's error.log saying why. Provided you are using CGI.pm, you can also see what happens by running the script at a shell prompt. If the error.log claims there are syntax errors in your script, but perl -c /path/to/my/mod_perl/scripts/foo says it is OK, you have probably used __END__ or __DATA__. Sorry. Mod_perl's Apache::Registry can't deal with that. =head1 My CGI script behaves strangely under mod_perl. Why? Remember that a conventional CGI script always starts up a fresh perl interpreter, whereas a mod_perl script is reused in the same process context many times. This means that certain categories of variables can survive from one invocation of the script to the next. You can make that work to your advantage, but you can also be caught out by it. When diagnosing a problem that might be caused by variable lifetimes, always start the web server in single process mode. Apache normally spawns a number of child processes to handle queries, and they get used in round-robin fashion, which makes test results unpredictable. The command # ./httpd -X will start a single-process server with its default configuration. You can specify a different configuration with the -f flag (and thus use a different port number for testing, for instance). Now try executing your script from a browser or with a tool such a wget. Here are some of the effects that you might see. =head2 The server terminates after processing the first request Your script is calling the CORE perl C<exit()> function. That is not a problem in a conventional CGI script, provided that query processing is complete. But you almost certainly don't want to exit in a mod_perl script. It kills the server process that handled the request, meaning that the advantage of using mod_perl to avoid startup overhead is lost. The best way to avoid calling C<exit()> is to restructure the script so that all execution paths return to a common point at the end of the script. If this seems impractical you can force the same effect by placing a label after the last executable statement and replacing calls to C<exit()> with C<goto label;> See also what mod_perl_traps says about C<Apache::exit()> and the way that Apache::Registry causes it to terminate the script but not the httpd child. There may be exceptional circumstances in which you explicitly want to terminate the httpd child at the end of the current request. In this case C<Apache-E<gt>exit(-2)> should be used. =head2 Variables retain their value from one request to the next The so-called sticky query effect happens when the CGI query object, or another request-specific variable, has a lifetime longer than a single execution of your script and does not get reinitialised each time the script is invoked. This does not matter in a conventional CGI script, because the script starts with a clean slate for each new request. But a mod_perl script gets compiled into a subroutine by the Apache::Registry handler and then processes an arbitrary number of requests. To make sure that both you and the perl interpreter have the same idea about the meaning of your script, make sure it starts like this: #!/usr/bin/perl -w use strict; It is good for you! It will make perl point out all variables that you have not explicitly declared. You can then think about whether they need to be global or if they can be lexical. Try to declare things lexically, with my(). These variables disappear when the block they are declared in ends, so they don't occupy memory when they are not in use and they also do not need a run-time symbol table entry. Beware, though, of referring to a lexical variable indirectly from within a subroutine. To quote L<perlsub/"Private Variables via my()">, the variable "... now becomes unreachable by the outside world, but retains its value between calls to ..." the subroutine. You will see classic "sticky query" symptoms if your code looks like this: #!/usr/bin/perl -w use strict; use CGI; my $q = CGI->new(); doit(); sub doit { print($q->header(), $q->start_html()); print('Value is ', $q->param('val')) if $q->param; $q->print('<p>', $q->startform, 'Value? ', $q->textfield(-name=>'val', -size=>20), ' ', $q->submit('enter'), $q->endform); print($q->end_html()); } Because you remembered to put the -w switch on the first line, the error log will tell you that "Variable $q will not stay shared" (provided you are using perl5.004 or higher). You must either pass the variable to the subroutine as a parameter, doit($q) sub doit { my($q) = @_; .... or declare this variable to be global, use vars qw($q); $q = CGI->new(); The reason why Perl works this way is explained in a news posting by Mike Guy that is included with this FAQ (mjtg-news.txt). =for html <a href="mjtg-news.txt">mjtg-news.txt</a> =head2 Variables B<still> retain their value from one request to the next CGI.pm must pull some extra tricks when it is being used via Apache::Registry. Versions of CGI.pm before 2.35 did not know this, and Apache::Registry will complain if you try to use an earlier version. CGI.pm detects that it is running under Apache::Registry by looking for an environment variable. This test can fail if C<use CGI> is evaluated too early, before the environment has been set up. That can happen if you have C<use CGI> in a script and pull the script in with a C<PerlRequire> directive in httpd.conf. Replacing C<use CGI> with C<require CGI> will fix it. =head2 Do I have to rewrite my legacy code for mod_perl? If you have CGI code that seems to be fundamentally at odds with mod_perl's "compile once, run many" environment, you may be find that it will work if run under the module C<Apache::PerlRun>. See the documentation of that module, which is included with recent versions of mod_perl. =head1 How can my script continue running after sending the response? If the client submits a form that will take some time to process, you may want to say "Thanks for submitting the form" and close the connection, before processing it. You can achieve this by registering the subroutine that processes the form as a cleanup handler: if($ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/) { Apache->request->register_cleanup(sub { doProcess($query) }); } 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_cvs.pod Index: mod_perl_cvs.pod =================================================================== =head1 NAME mod_perl_cvs - Access to the mod_perl CVS development tree =head1 DESCRIPTION The mod_perl development tree lives on cvs.apache.org. This tree contains the latest mod_perl bug fixes and developments that have not made it to CPAN yet. Welcome to the bleeding edge. =head1 SYNOPSIS Just as cvs access to the Apache development tree, the mod_perl code pulled from cvs is not guaranteed to do anything, especially not compile or work. But, that's exactly why we are using cvs, so everyone has access the latest version and can help see to it that mod_perl does compile and work on all platforms, with the various versions and configurations of Perl and Apache. Patches are always welcome, simply testing the latest snapshots is just as, if not more helpful. It's recommended to subscribe to the I<modperl-cvs@apache.org> list, which is the place cvs commit logs and diffs are mailed to; at least if you're going to work on the code. Here are the several ways to access the cvs tree. =over 4 =item cvsup Cvsup has come out of the FreeBSD group. It's a client/server beast that offers an efficient way to sync collections of files over the net, and it is very CVS aware, allowing syncronisation of repositories or checked out files using the cvs deltas to bring the client side files up to date with minimal data transfer. For a FreeBSD cvsup client see: http://www.freebsd.org/cgi/ports.cgi?query=cvsup&stype=all Others (SunOS, alpha.osf, linux, Solaris2.4, HPAA 10.2, irix) ftp://ftp.postgresql.org/pub/CVSup/ Here's a config file for the client (cvsup) to sync modperl sources. *default tag=. # comment out the above if you want the raw cvs files *default host=cvs.apache.org *default prefix=/path/on/this/machine/to/install/ # a subdir for modperl will appear here ^^^ *default base=/path/on/this/machine/where/cvsup/will/keep/status/info # you'll never need to look in the 'base' dir. *default release=cvs delete use-rel-suffix compress modperl #apache-src #apache-docs #uncomment these two for the latest apache src and/or docs if you want them =item anoncvs To checkout a fresh copy from anoncvs use cvs -d ":pserver:[EMAIL PROTECTED]:/home/cvspublic" login with the password "anoncvs". cvs -d ":pserver:[EMAIL PROTECTED]:/home/cvspublic" co modperl For a basic introduction to anoncvs see http://dev.apache.org/anoncvs.txt =item from-cvs A snapshot is rolled of the modperl tree every 6 hours and placed here: http://cvs.apache.org/snapshots/modperl/ A snapshot of the Apache development tree is also rolled every 6 hours and placed here: http://cvs.apache.org/snapshots/ =back =head1 SEE ALSO cvs(1) 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_faq.pod Index: mod_perl_faq.pod =================================================================== =head1 NAME Mod_perl_faq - frequently asked questions about mod_perl ($Date: 2002/01/05 19:20:01 $) =head1 DESCRIPTION Mod_perl allows an Apache Web Server to directly execute perl code. This document is designed to answer questions that arise when designing new applications, and converting existing applications, to run in the mod_perl environment. =head1 QUESTIONS & ANSWERS =head2 What is mod_perl? The Apache/Perl integration project brings together the full power of the Perl programming language and the Apache HTTP server. This is achieved by linking the Perl runtime library into the server and providing an object-oriented Perl interface to the server's C language API. Mod_perl is a bundle of software. One part of the bundle is designed to be compiled and linked together with Apache and Perl. The remainder is perl code that provides the object-oriented interface to the "perl-enabled" web server. The primary advantages of mod_perl are power and speed. You have full access to the inner-workings of the web server and can intervene at any stage of request-processing. This allows for customized processing of (to name just a few of the phases) URI-E<gt>filename translation, authentication, response generation and logging. There is very little run-time overhead. In particular, it is not necessary to start a separate process, as is often done with web-server extensions. The most wide-spread such extension mechanism, the Common Gateway Interface (CGI), can be replaced entirely with perl-code that handles the response generation phase of request processing. Mod_perl includes a general purpose module for this purpose (Apache::Registry) that can transparently run existing perl CGI scripts. =head2 Where can I get mod_perl? Mod_perl can be found at http://www.perl.com/CPAN/modules/by-module/Apache/ =head2 What else do I need? =over 4 =item Perl http://www.perl.com/CPAN/src/latest.tar.gz Win32 users note: at the time of writing, ActiveState's Perl cannot be used with mod_perl, because it is based on an old version of perl (perl-5.003_07, build 316). =item Apache http://www.apache.org/dist/ =back =head2 How do I install it? Here is the easiest way to proceed. Let's assume you have the latest version of perl (5.004) installed. Unpack the Apache and Mod_perl tarballs next to one another under a common directory: e.g. % cd /usr/local/src % zcat apache_1.2.0.tar.gz | tar xf - % zcat mod_perl-0.98_12.tar.gz | tar xf - You probably do not need to change anything in the apache configuration before compiling. Only if you want to enable additional non-standard modules do you need to edit apache_1.2.0/src/Configuration. There is no need to set CC, CFLAGS, etc., because mod_perl will override them with the values that were used to compile your perl. Now go to the mod_perl directory and follow the instructions in the INSTALL file there. If "make test" and "make install" are successful, you will find the new web server in apache_1.2.0/src/httpd. Move it to a suitable location, make sure it has access to the correct configuration files, and fire it up. =head2 What documentation should I read? The mod_perl documentation in mod_perl.pod. After you have installed mod_perl you can read it with the command: C<perldoc mod_perl>. If you are using mod_perl to extend the server functionality, you will need to read C<perldoc Apache> and the Apache API notes, which can be found in apache_1.2.0/htdocs/manual/misc/API.html. Existing (perl-) CGI scripts should run as-is under mod_perl. There are a number of reasons why they may need to be adjusted, and these are discussed later in this FAQ. If you are developing a new CGI script it is probably best to use CGI.pm. It is part of the standard perl distribution and its documentation can be read with the command: C<perldoc CGI>. =head2 How do I run CGI scripts under mod_perl? Refer to L<mod_perl_cgi> for tips on writing and converting CGI scripts for mod_perl. =head2 How do I access the Apache API from mod_perl? Interfacing with Apache is discussed in L<mod_perl_api>. =head2 How secure are mod_perl scripts? Because mod_perl runs within an httpd child process, it runs with the user-id and group-id specified in the httpd.conf file. This user/group should have the lowest possible privileges. It should only have access to world readable files. Even so, careless scripts can give away information. You would not want your /etc/passwd file to be readable over the net, for instance. If you turn on tainting checks, perl can help you to avoid the pitfalls of using data received from the net. Setting the C<-T> switch on the first line of the script is not sufficient to enable tainting checks under mod_perl. You have to include the directive C<PerlTaintCheck On> in the httpd.conf file. =head2 What if my script needs higher privileges? You will have to start a new process that runs under a suitable user-id (or group-id). If all requests handled by the script will need the higher privileges, you might as well write it as a suid CGI script. Read the documentation about suEXEC in Apache-1.2. Alternatively, pre-process the request with mod_perl and fork a suid helper process to handle the privileged part of the task. =head2 Why is httpd using so much memory? Read the section on "Memory Consumption" in the mod_perl.pod. Make sure that your scripts are not leaking memory. Global variables stay around indefinitely, lexical variables (declared with my()) are destroyed when they go out of scope, provided there are no references to them from outside of that scope. To get information about the modules that have been loaded and their symbol-tables, use the Apache::Status module. It is enabled by adding these lines to a configuration file (e.g. access.conf); <Location /perl-status> SetHandler perl-script PerlHandler Apache::Status </Location> Then look at the URL http://www.your.host/perl-status Joel Wagner reports that calling an undefined subroutine in a module can cause a tight loop that consumes all memory. Here is a way to catch such errors. Define an autoload subroutine sub UNIVERSAL::AUTOLOAD { my $class = shift; warn "$class can't `$UNIVERSAL::AUTOLOAD'!\n"; } It will produce a nice error in error_log, giving the line number of the call and the name of the undefined subroutine. =head2 Do I have to restart the server when I change my Perl code? Apache::Registry checks the timestamp of scripts that it has loaded and reloads them if they change. This does not happen for other handlers, unless you program it yourself. One way to do this is in a PerlInitHandler. If you define sub MY::init { delete $INC{"YourModule.pm"}; require YourModule; } as an init handler, it will unconditionally reload YourModule at the start of each request, which may be useful while you are developing a new module. It can be made more efficient by storing the timestamp of the file in a global variable and only reloading when necessary. =head2 So how do I use mod_perl in conjunction with ErrorDocument? Andreas Koenig writes: =over 4 =item * Set up your testing engine: LWP comes with debugging capabilities that are sometimes better than your browser, sometimes your browser is the better testing device. Make sure you can call lwp-request from the command line and have your browser ready before you start. I find the C<-x> switch (extended debugging) and the C<-d> switch (do not display content) most useful. =item * Test your server with lwp-request -xd http://your.server/test/file.not_there Carefully examine if the status is 404 and if the headers look good. If you try 'lwp-request -es', the HTML output will not be the one you are sending, instead lwp-request will send its own cooked HTML text (as of version libwww-perl-5.09). Check the real text either with the C<-x> switch or with telnet or your browser. =item * Set up your Errordocument configuration in the testing area. I have this in my .htaccess file: ErrorDocument 404 /perl/errors/err404-01 The /perl/ directory is configured to <Location /perl> SetHandler perl-script PerlHandler Apache::Registry::handler Options ExecCGI </Location> I have no PerlSendHeader and no PerlNewSendHeader directive in any configuration file. =item * Repeat step 2 (Test your server) =item * Write your error handler in mod_perl. You have to be prepared that you have to tell both apache *and* the browser the right thing. Basically you have to tell the browser what the error is, but you have to pretend to apache that everything was OK. If you tell apache the error condition, it will handle the situation on its own and add some unwanted stuff to the output that goes to the browser. The following works fine for me: my $r = Apache->request; $r->content_type('text/html; charset=ISO-8859-1'); $r->send_http_header; $r->status(200); ...send other HTML stuff... At the time of the send_http_header we have an error condition of type 404--this is what gets sent to the browser. After that I set status to 200 to silence the apache engine. I was not successful in trying to do the same with CGI.pm, but I didn't try very hard. =item * Repeat step 2 (Test your server) =item * The above is tested with mod_perl/0.98 and 0.99 =item * Open questions I could not find documentation for (except RTFS): what exactly is PerlSendHeaders and PerlNewSendHeaders. What is the default setting for those? How do these cooperate with CGI.pm, Apache.pm, Apache::Registry? =back =head2 How can I reference private library modules? The best place to put library modules is in the site_perl directory (usually /usr/lib/perl/site_perl), where perl will find them without further ado. Local policy may prevent this, in which case you have to tell the perl interpreter where to find them by adding your private directory to the @INC array. There are various ways to do this. One way is to add use lib '/my/perl/lib'; to each script that needs modules from /my/perl/lib. Alternatively, you can arrange for all the modules that might be needed to be loaded when the server starts up. Put a PerlRequire directive into one of the httpd config files that pulls in a small module containing the relevant C<use>-statements. There is an example of this in L<mod_perl_tuning>. =head2 How can I pass arguments to a SSI script? Following the documentation, I have put the following in the html file: <!--#perl sub="Apache::Include" arg="/perl/ssi.pl" --> I want to send an argument to the ssi.pl script. How? It won't work with Apache::Include. Instead of a script, define a subroutine that's pulled in with PerlRequire or PerlModule, like so: sub My::ssi { my($r, $one, $two, $three) = @_; ... } In the html file: <!--#perl sub="My::ssi" arg="one" arg="two" arg="three" --> =head2 Why is image-file loading so slow when testing with httpd -X ? If you use Netscape while your server is running in single-process mode, the "KeepAlive" feature gets in the way. Netscape tries to open multiple connections and keep them open. Because there is only one server process listening, each connection has to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to avoid this effect. =head2 What can cause a subroutine to suddenly become undefined? If you sometimes see error messages like this: [Thu Sep 11 11:03:06 1997] Undefined subroutine &Apache::ROOT::perl::script1::sub_foo called at /some/path/perl/script2 line 42. despite the fact that script2 normally works just fine, it looks like you have a namespace problem in a library file. If sub_foo is located in a file that is pulled in by 'require' and both script1 and script2 require it, you need to be sure that the file containing sub_foo sets a package name. Otherwise, sub_foo gets defined in the namespace that is active the first time it is required, and the next require is a no-op because that file is already in %INC. The solution is simple, set up your require'd file something along these lines: package SomeName; sub sub_foo {...} Now, have scripts call SomeName::sub_foo() instead of sub_foo(). =head2 What could be causing sporadic errors "in cleanup"? Some people have seen error messages such as this: [Fri Sep 26 10:50:08 1997] (in cleanup) no dbproc key in hash at /usr/lib/perl5/site_perl/Apache/Registry.pm line 119. Doug writes: "I have yet to figure out why, but there have been a few arbitrary cases where Perl (in mod_perl) _insists_ on finding and/or calling a DESTROY method for an object. Defining an empty sub DESTROY has been the bandaid for these few cases." If the specific error message gives you a hint about which object is causing difficulty, put the C<sub DESTROY { }> in the module that defines that object class. =head2 How can I test that my script is running under mod_perl? There are 2 environment variables you can test. exists $ENV{"MOD_PERL"} # if running under mod_perl $ENV{"GATEWAY_INTERFACE"} eq "CGI-Perl/1.1" The MOD_PERL variable gets set immediately when the perl interpreter starts up, whereas GATEWAY_INTERFACE may not be set yet when BEGIN blocks are being processed. =head2 Where can I get help that I did not find in here? There is a mailing-list dedicated to mod_perl. It is archived at http://outside.organic.com/mail-archives/modperl/ and at http://mathforum.org/epigone/modperl (which has a search engine) and also at http://www.progressive-comp.com/Lists/?l=apache-modperl&r=1#apache-modperl (threaded and indexed). You can subscribe to the list by sending a mail with the line C<subscribe modperl> to C<[EMAIL PROTECTED]>. The mod_perl homepage http://perl.apache.org/ has links to other mod_perl resources. The pod source of this FAQ is available at http://www.ping.de/~fdc/mod_perl/mod_perl_faq.tar.gz =head2 Where do I send suggestions and corrections concerning this FAQ? mailto:[EMAIL PROTECTED] 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_method_handlers.pod Index: mod_perl_method_handlers.pod =================================================================== =head1 NAME mod_perl_method_handlers - How to use mod_perl's MethodHandlers =head1 DESCRIPTION Described here are a few examples and hints how to use MethodHandlers with modperl. This document assumes familiarity with at least L<perltoot> and "normal" usage of the Perl*Handlers. It isn't strictly modperl related, more like "what I use objects for in my modperl environment". =head1 SYNOPSIS If a Perl*Handler is prototyped with '$$', this handler will be invoked as method, being passed a class name or blessed object as its first argument and the blessed I<request_rec> as the second argument, e.g. package My; @ISA = qw(BaseClass); sub handler ($$) { my($class, $r) = @_; ...; } package BaseClass; sub method ($$) { my($class, $r) = @_; ...; } __END__ Configuration: PerlHandler My or PerlHandler My->handler Since the handler is invoked as a method, it may inherit from other classes: PerlHandler My->method In this case, the 'My' class inherits this method from 'BaseClass'. To build in this feature, configure with: % perl Makefile.PL PERL_METHOD_HANDLERS=1 [PERL_FOO_HOOK=1,etc] =head1 WHY? The short version: For pretty much the same reasons we're using OO perl everywhere else. :-) See L<perltoot>. The slightly longer version would include some about code reusage and more clean interface between modules. =head1 SIMPLE EXAMPLE Let's start with a simple example. In httpd.conf: <Location /obj-handler> SetHandler perl-script PerlHandler $My::Obj->method </Location> In startup.pl or another PerlRequire'd file: package This::Class; $My::Obj = bless {}; sub method ($$) { my($obj, $r) = @_; $r->send_http_header("text/plain"); print "$obj isa ", ref($obj); 0; } which displays: This::Class=HASH(0x8411edc) isa This::Class =head1 A LITTLE MORE ADVANCED That wasn't really useful, so let's try something little more advanced. I've a little module which creates a graphical 'datebar' for a client. (See C<http://www.hip.dk/date_bar>). It's reading a lot of small gifs with numbers and weekdays, and keeping them in memory in GD.pm's native format, ready to be copied together and served as gifs. Now I wanted to use it at another site too, but with a different look. Obviously something to do with a object. Hence I changed the module to a object, and can now do a $Client1::Datebar = new Datebar( -imagepath => '/home/client1/datebar/', -size => [131,18], -elements => 'wday mday mon year hour min', ); $Client2::Datebar = new Datebar -imagepath => '/home/client2/datebar/', -size => [90,14], -elements => 'wday hour min', ); And then use $Client1::Datebar and $Client2::Datebar as PerlHandlers in my Apache configuration. Remember to pass them in literal quotes ('') and not "" which will be interpolated! I've a webinterface system to our content-database. I've created objects to handle the administration of articles, banners, images and other content. It's then very easy (a few lines of code) to enable certain modules for each client, depending on their needs. Another area where I use objects with great success in my modperl configurations is database abstraction. All our clients using the webinterface to handle f.x. articles will use a simple module to handle everything related to the database. Each client have $Client::Article = new WebAjour::Article(-host => 'www.client.com'); in a module what will be run at server startup. I can then use some simple methods from the $Client::Article object in my embperl documents, like: [- $c = $Client::Article->GetCursor(-layout=>'Frontpage') -] [$ while($c->Fetch) $] <h2>[+ $c->f('header') +]</h2> [+ $c->f('textfield') +] [$ endwhile $] Very very useful! =head1 TRAPS mod_perl expects object handlers to be in the form of a string, which it will thaw for you. That means that something like $r->push_handlers(PerlHandler => '$self->perl_handler_method'); This doesn't work as you might expect, since Perl isn't able to see $self once it goes to PerlHandler. The best solution to this is to use an anonymous subroutine and pass it $r yourself, like this: $r->push_handlers(PerlHandler => sub { my $r = shift; $self->perl_handler_method($r); } ); =head1 AUTHOR This document is written by Ask Bjoern Hansen E<lt>[EMAIL PROTECTED]<gt> or E<lt>[EMAIL PROTECTED]<gt>. Corrections and suggestions are most welcome. In particular would more examples be appreciated, most of my own code is way too integrated with our system, which isn't suitable for public release. Some codesnippets is from Doug MacEachern. =head1 SEE ALSO L<mod_perl>, L<Apache>, L<perltoot> (also available at C<http://www.perl.com/CPAN/doc/FMTEYEWTK/perltoot.html>) 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_traps.pod Index: mod_perl_traps.pod =================================================================== =head1 NAME mod_perl_traps - common/known mod_perl traps =head1 DESCRIPTION In the CGI environment, the server starts a single external process (Perl interpreter) per HTTP request which runs single script in that process space. When the request is over, the process goes away everything is cleaned up and a fresh script is started for the next request. mod_perl brings Perl inside of the HTTP server not only for speedup of CGI scripts, but also for access to server functionality that CGI scripts do not and/or cannot have. Now that we're inside the server, each process will likely handle more than one Perl script and keep it "compiled" in memory for longer than a single HTTP request. This new location and longer lifetime of Perl execution brings with it some common traps. This document is here to tell you what they are and how to prevent them. The descriptions here are short, please consult the mod_perl FAQ for more detail. If you trip over something not documented here, please send a message to the mod_perl list. =head2 Migrating from CGI =over 4 =item * Be sure to have read L<cgi_to_mod_perl> =item * Scripts under Apache::Registry are not run in package B<main>, they are run in a unique namespace based on the requested uri. =item * Apache::Registry scripts cannot contain __END__ or __DATA__ tokens =item * Output of C<system>, C<exec> and C<open PIPE, "|program"> calls will not be sent to the browser unless you Perl was configured with sfio. =item * Perl's exit() built-in function cannot be used in mod_perl scripts. The Apache::exit() function should be used instead. Apache::exit() automatically overrides the built-in exit() for Apache::Registry and Apache::PerlRun scripts. =item * Your script *will not* run from the command line if your script makes any direct calls to Apache-E<gt>methods. See Apache::FakeRequest. =back =head2 Apache::Registry =over 4 =item undefined subroutine &Apache::Registry::handler Interaction with certain modules causes the shortcut configuration to break, if you see this message change your configuration from this: <Location /perl> PerlHandler Apache::Registry ... </Location> To this: PerlModule Apache::Registry <Location /perl> PerlHandler Apache::Registry::handler ... </Location> =back =head2 Using CGI.pm and CGI::* =over 4 =item * CGI.pm users B<must> have version B<2.39> of the package or higher, earlier versions will not work under mod_perl. =item * If you use the C<SendHeaders()> function, be sure to call $req_obj-E<gt>cgi-E<gt>done when you are done with a request, just as you would under I<CGI::MiniSrv>. =back =head2 Perl Modules and Extensions =over 4 =item * Files pulled in via C<use> or C<require> statements are not automatically reloaded when changed on disk. See the Apache::StatINC module to add this functionality. =item Undefined subroutines A common trap with required files may result in an error message similar to this in the error_log: [Thu Sep 11 11:03:06 1997] Undefined subroutine &Apache::ROOT::perl::test_2epl::some_function called at /opt/www/apache/perl/test.pl line 79. As the above items explains, a file pulled in via C<require> will only happen once per-process (unless %INC is modified). If the file does not contain a C<package> declaration, the file's subroutines and variables will be created in the current package. Under CGI, this is commonly package C<main>. However, B<Apache::Registry> scripts are compiled into a unique package name (base on the uri). So, if multiple scripts in the same process try to require the same file, which does not declare a package, only one script will actually be able to see the subroutines. The solution is to read L<perlmodlib>, L<perlmod> and related perl documentation and re-work your required file into a module which exports functions or defines a method interface. Or something more simple, along these lines: #required_file.pl package Test; sub some_function {...} ... __END__ Now, have your scripts say: require "required_file.pl"; Test::some_function(); =item "Use of uninitialized value" Because of eval context, you may see warnings with useless filename/line, example: Use of uninitialized value at (eval 80) line 12. Use of uninitialized value at (eval 80) line 43. Use of uninitialized value at (eval 80) line 44. To track down where this eval is really happening, try using a B<__WARN__> handler to give you a stack trace: use Carp (); local $SIG{__WARN__} = \&Carp::cluck; =item "Callback called exit" =item "Out of memory!" If something goes really wrong with your code, Perl may die with an "Out of memory!" message and or "Callback called exit". A common cause of this are never-ending loops, deep recursion or calling an undefined subroutine. Here's one way to catch the problem: See Perl's INSTALL document for this item: =item -DPERL_EMERGENCY_SBRK If PERL_EMERGENCY_SBRK is defined, running out of memory need not be a fatal error: a memory pool can allocated by assigning to the special variable $^M. See perlvar(1) for more details. If you compile with that option and add 'use Apache::Debug level =E<gt> 4;' to your PerlScript, it will allocate the $^M emergency pool and the $SIG{__DIE__} handler will call Carp::confess, giving you a stack trace which should reveal where the problem is. See the B<Apache::Resource> module for prevention of spinning httpds. =item * If you wish to use a module that is normally linked static with your Perl, it must be listed in static_ext in Perl's Config.pm to be linked with httpd during the mod_perl build. =item Can't load '$Config{sitearchexp}/auto/Foo/Foo.so' for module Foo... When starting httpd some people have reported seeing an error along the lines of: [Thu Jul 9 17:33:42 1998] [error] Can't load '/usr/local/ap/lib/perl5/site_perl/sun4-solaris/auto/DBI/DBI.so' for module DBI: ld.so.1: src/httpd: fatal: relocation error: file /usr/local/ap/lib/perl5/site_perl/sun4-solaris/auto/DBI/DBI.so: symbol Perl_sv_undef: referenced symbol not found at /usr/local/ap/lib/perl5/sun4-solaris/5.00404/DynaLoader.pm line 166. Or similar for the IO module or whatever dynamic module mod_perl tries to pull in first. The solution is to re-configure, re-build and re-install Perl and dynamic modules with the following flags when Configure asks for "additional LD flags": -Xlinker --export-dynamic or -Xlinker -E This problem is only known to be caused by installing gnu ld under Solaris. Other known causes of this problem: OS distributions that ship with a (broken) binary Perl installation. The `perl' program and `libperl.a' library are somehow built with different binary compatiblity flags. The solution to these problems is to rebuild Perl and extension modules from a fresh source tree. Tip for running Perl's Configure script, use the `C<-des>' flags to accepts defaults and `C<-D>' flag to override certain attributes: % ./Configure -des -Dcc=gcc ... && make test && make install Read Perl's INSTALL doc for more details. =back =head2 Clashes with other Apache C modules =over 4 =item mod_auth_dbm If you are a user of B<mod_auth_dbm> or B<mod_auth_db>, you may need to edit Perl's C<Config> module. When Perl is configured it attempts to find libraries for ndbm, gdbm, db, etc., for the *DBM*_File modules. By default, these libraries are linked with Perl and remembered by the B<Config> module. When mod_perl is configured with apache, the B<ExtUtils::Embed> module returns these libraries to be linked with httpd so Perl extensions will work under mod_perl. However, the order in which these libraries are stored in B<Config.pm>, may confuse C<mod_auth_db*>. If C<mod_auth_db*> does not work with mod_perl, take a look at this order with the following command: % perl -V:libs If C<-lgdbm> or C<-ldb> is before C<-lndbm>, example: libs='-lnet -lnsl_s -lgdbm -lndbm -ldb -ldld -lm -lc -lndir -lcrypt'; Edit B<Config.pm> and move C<-lgdbm> and C<-ldb> to the end of the list. Here's how to find B<Config.pm>: % perl -MConfig -e 'print "$Config{archlibexp}/Config.pm\n"' Another solution for building Apache/mod_perl+mod_auth_dbm under Solaris is to remove the DBM and NDBM "emulation" from libgdbm.a. Seems Solaris already provides its own DBM and NDBM, and there's no reason to build GDBM with them (for us anyway). In our Makefile for GDBM, we changed OBJS = $(DBM_OF) $(NDBM_OF) $(GDBM_OF) to OBJS = $(GDBM_OF) Rebuild libgdbm, then Apache/mod_perl. =back =head1 REGULAR EXPRESSIONS =head2 COMPILED REGULAR EXPRESSIONS When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not vary during the execution of the program, a standard optimization technique consists of adding the C<o> modifier to the regexp pattern, to direct the compiler to build the internal table once, for the entire lifetime of the script, rather than every time the pattern is executed. Consider: my $pat = '^foo$'; # likely to be input from an HTML form field foreach( @list ) { print if /$pat/o; } This is usually a big win in loops over lists, or when using C<grep> or C<map>. In long-lived C<mod_perl> scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the table and perform the search correctly, however, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken. There are two solutions to this problem. The first is to use C<eval q//>, to force the code to be evaluated each time. Just make sure that the C<eval> block covers the entire loop of processing, and not just the pattern match itself. The above code fragment would be rewritten as: my $pat = '^foo$'; eval q{ foreach( @list ) { print if /$pat/o; } } Just saying eval q{ print if /$pat/o; }; is going to be a horribly expensive proposition. You use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an C<m//> or C<s///>), you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of C<eval>. The above code fragment becomes: my $pat = '^foo$'; "something" =~ /$pat/; # dummy match (MUST NOT FAIL!) foreach( @list ) { print if //; } The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the // will match everything. If you can't count on fixed text to ensure the match succeeds, you have two possibilities. If you can guaranteee that the pattern variable contains no meta-characters (things like C<*>, C<+>, C<^>, C<$>...), you can use the dummy match: "$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the unsearchable C<\377> character as follows: "\377" =~ /$pat|^[\377]$/; # guarenteed if meta-characters present =head2 References The Camel Book, 2nd edition, p. 538 (p. 356 in the 1st edition). =head1 AUTHORS Doug MacEachern, with contributions from Jens Heunemann E<lt>[EMAIL PROTECTED]<gt>, David Landgren E<lt>[EMAIL PROTECTED]<gt>, Mark Mills E<lt>[EMAIL PROTECTED]<gt> and Randal Schwartz E<lt>[EMAIL PROTECTED]<gt> 1.1 modperl-docs/src/docs/1.0/faqs/mod_perl_tuning.pod Index: mod_perl_tuning.pod =================================================================== =head1 NAME mod_perl_tuning - mod_perl performance tuning =head1 DESCRIPTION Described here are examples and hints on how to configure a mod_perl enabled Apache server, concentrating on tips for configuration for high-speed performance. The primary way to achieve maximal performance is to reduce the resources consumed by the mod_perl enabled HTTPD processes. This document assumes familiarity with Apache configuration directives some familiarity with the mod_perl configuration directives, and that you have already built and installed a mod_perl enabled Apache server. Please also read the mod_perl documentation that comes with mod_perl for programming tips. Some configurations below use features from mod_perl version 1.03 which were not present in earlier versions. These performance tuning hints are collected from my experiences in setting up and running servers for handling large promotional sites, such as The Weather Channel's "Blimp Site-ings" game, the MSIE 4.0 "Subscribe to Win" game, and the MSN Million Dollar Madness game. =head1 BASIC CONFIGURATION The basic configuration for mod_perl is as follows. In the F<httpd.conf> file, I add configuration parameters to make the C<http://www.domain.com/programs> URL be the base location for all mod_perl programs. Thus, access to C<http://www.domain.com/programs/printenv> will run the printenv script, as we'll see below. Also, any *.perl file will be interpreted as a mod_perl program just as if it were in the programs directory, and *.rperl will be mod_perl, but I<without> any HTTP headers automatically sent; you must do this explicitly. If you don't want these last two, just leave it out of your configuration. In the configuration files, I use F</var/www> as the C<ServerRoot> directory, and F</var/www/docs> as the C<DocumentRoot>. You will need to change it to match your particular setup. The network address below in the access to perl-status should also be changed to match yours. Additions to F<httpd.conf>: # put mod_perl programs here # startup.perl loads all functions that we want to use within mod_perl Perlrequire /var/www/perllib/startup.perl <Directory /var/www/docs/programs> AllowOverride None Options ExecCGI SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader On </Directory> # like above, but no PerlSendHeaders <Directory /var/www/docs/rprograms> AllowOverride None Options ExecCGI SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader Off </Directory> # allow arbitrary *.perl files to be scattered throughout the site. <Files *.perl> SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader On Options +ExecCGI </Files> # like *.perl, but do not send HTTP headers <Files *.rperl> SetHandler perl-script PerlHandler Apache::Registry PerlSendHeader Off Options +ExecCGI </Files> <Location /perl-status> SetHandler perl-script PerlHandler Apache::Status order deny,allow deny from all allow from 204.117.82. </Location> Now, you'll notice that I use a C<PerlRequire> directive to load in the file F<startup.perl>. In that file, I include all of the C<use> statements that occur in any of my mod_perl programs (either from the programs directory, or the *.perl files). Here is an example: #! /usr/local/bin/perl use strict; # load up necessary perl function modules to be able to call from Perl-SSI # files. These objects are reloaded upon server restart (SIGHUP or SIGUSR1) # if PerlFreshRestart is "On" in httpd.conf (as of mod_perl 1.03). # only library-type routines should go in this directory. use lib "/var/www/perllib"; # make sure we are in a sane environment. $ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/ or die "GATEWAY_INTERFACE not Perl!"; use Apache::Registry (); # for things in the "/programs" URL # pull in things we will use in most requests so it is read and compiled # exactly once use CGI (); CGI->compile(':all'); use CGI::Carp (); use DBI (); use DBD::mysql (); 1; What this does is pull in all of the code used by the programs (but does not C<import> any of the module methods) into the main HTTPD process, which then creates the child processes with the code already in place. You can also put any new modules you like into the F</var/www/perllib> directory and simply C<use> them in your programs. There is no need to put C<use lib "/var/www/perllib";> in all of your programs. You do, however, still need to C<use> the modules in your programs. Perl is smart enough to know it doesn't need to recompile the code, but it does need to C<import> the module methods into your program's name space. If you only have a few modules to load, you can use the PerlModule directive to pre-load them with the same effect. The biggest benefit here is that the child process never needs to recompile the code, so it is faster to start, and the child process actually shares the same physical copy of the code in memory due to the way the virtual memory system in modern operating systems works. You will want to replace the C<use> lines above with modules you actually need. =head2 Simple Test Program Here's a sample script called F<printenv> that you can stick in the F<programs> directory to test the functionality of the configuration. #! /usr/local/bin/perl use strict; # print the environment in a mod_perl program under Apache::Registry print "Content-type: text/html\n\n"; print "<HEAD><TITLE>Apache::Registry Environment</TITLE></HEAD>\n"; print "<BODY><PRE>\n"; print map { "$_ = $ENV{$_}\n" } sort keys %ENV; print "</PRE></BODY>\n"; When you run this, check the value of the GATEWAY_INTERFACE variable to see that you are indeed running mod_perl. =head1 REDUCING MEMORY USE As a side effect of using mod_perl, your HTTPD processes will be larger than without it. There is just no way around it, as you have this extra code to support your added functionality. On a very busy site, the number of HTTPD processes can grow to be quite large. For example, on one large site, the typical HTTPD was about 5Mb large. With 30 of these, all of RAM was exhausted, and we started to go to swap. With 60 of these, swapping turned into thrashing, and the whole machine slowed to a crawl. To reduce thrashing, limiting the maximum number of HTTPD processes to a number that is just larger than what will fit into RAM (in this case, 45) is necessary. The drawback is that when the server is serving 45 requests, new requests will queue up and wait; however, if you let the maximum number of processes grow, the new requests will start to get served right away, I<but> they will take much longer to complete. One way to reduce the amount of real memory taken up by each process is to pre-load commonly used modules into the primary HTTPD process so that the code is shared by all processes. This is accomplished by inserting the C<use Foo ();> lines into the F<startup.perl> file for any C<use Foo;> statement in any commonly used Registry program. The idea is that the operating system's VM subsystem will share the data across the processes. You can also pre-load Apache::Registry programs using the C<Apache::RegistryLoader> module so that the code for these programs is shared by all HTTPD processes as well. B<NOTE>: When you pre-load modules in the startup script, you may need to kill and restart HTTPD for changes to take effect. A simple C<kill -HUP> or C<kill -USR1> will not reload that code unless you have set the C<PerlFreshRestart> configuration parameter in F<httpd.conf> to be "On". =head1 REDUCING THE NUMBER OF LARGE PROCESSES Unfortunately, simply reducing the size of each HTTPD process is not enough on a very busy site. You also need to reduce the quantity of these processes. This reduces memory consumption even more, and results in fewer processes fighting for the attention of the CPU. If you can reduce the quantity of processes to fit into RAM, your response time is increased even more. The idea of the techniques outlined below is to offload the normal document delivery (such as static HTML and GIF files) from the mod_perl HTTPD, and let it only handle the mod_perl requests. This way, your large mod_perl HTTPD processes are not tied up delivering simple content when a smaller process could perform the same job more efficiently. In the techniques below where there are two HTTPD configurations, the same httpd executable can be used for both configurations; there is no need to build HTTPD both with and without mod_perl compiled into it. With Apache 1.3 this can be done with the DSO configuration -- just configure one httpd invocation to dynamically load mod_perl and the other not to do so. These approaches work best when most of the requests are for static content rather than mod_perl programs. Log file analysis become a bit of a challenge when you have multiple servers running on the same host, since you must log to different files. =head2 TWO MACHINES The simplest way is to put all static content on one machine, and all mod_perl programs on another. The only trick is to make sure all links are properly coded to refer to the proper host. The static content will be served up by lots of small HTTPD processes (configured I<not> to use mod_perl), and the relatively few mod_perl requests can be handled by the smaller number of large HTTPD processes on the other machine. The drawback is that you must maintain two machines, and this can get expensive. For extremely large projects, this is the best way to go. =head2 TWO IP ADDRESSES Similar to above, but one HTTPD runs bound to one IP address, while the other runs bound to another IP address. The only difference is that one machine runs both servers. Total memory usage is reduced because the majority of files are served by the smaller HTTPD processes, so there are fewer large mod_perl HTTPD processes sitting around. This is accomplished using the F<httpd.conf> directive C<BindAddress> to make each HTTPD respond only to one IP address on this host. One will have mod_perl enabled, and the other will not. =head2 TWO PORT NUMBERS If you cannot get two IP addresses, you can also split the HTTPD processes as above by putting one on the standard port 80, and the other on some other port, such as 8042. The only configuration changes will be the C<Port> and log file directives in the httpd.conf file (and also one of them does not have any mod_perl directives). The major flaw with this scheme is that some firewalls will not allow access to the server running on the alternate port, so some people will not be able to access all of your pages. If you use this approach or the one above with dual IP addresses, you probably do not want to have the *.perl and *.rperl sections from the sample configuration above, as this would require that your primary HTTPD server be mod_perl enabled as well. Thanks to Gerd Knops for this idea. =head2 USING ProxyPass WITH TWO SERVERS To overcome the limitation of the alternate port above, you can use dual Apache HTTPD servers with just slight difference in configuration. Essentially, you set up two servers just as you would with the two port on same IP address method above. However, in your primary HTTPD configuration you add a line like this: ProxyPass /programs http://localhost:8042/programs Where your mod_perl enabled HTTPD is running on port 8042, and has only the directory F<programs> within its DocumentRoot. This assumes that you have included the mod_proxy module in your server when it was built. Now, when you access http://www.domain.com/programs/printenv it will internally be passed through to your HTTPD running on port 8042 as the URL http://localhost:8042/programs/printenv and the result relayed back transparently. To the client, it all seems as if it is just one server running. This can also be used on the dual-host version to hide the second server from view if desired. =begin html <P> A complete configuration example of this technique is provided by two HTTPD configuration files. <A HREF="httpd.conf.txt">httpd.conf</A> is for the main server for all regular pages, and <A HREF="httpd%2bperl.conf.txt">httpd+perl.conf</A> is for the mod_perl programs accessed in the <CODE>/programs</CODE> URL. </P> The directory structure assumes that F</var/www/documents> is the C<DocumentRoot> directory, and the the mod_perl programs are in F</var/www/programs> and F</var/www/rprograms>. I start them as follows: daemon httpd daemon httpd -f conf/httpd+perl.conf =end html Thanks to Bowen Dwelle for this idea. =head2 SQUID ACCELERATOR Another approach to reducing the number of large HTTPD processes on one machine is to use an accelerator such as Squid (which can be found at http://squid.nlanr.net/Squid/ on the web) between the clients and your large mod_perl HTTPD processes. The idea here is that squid will handle the static objects from its cache while the HTTPD processes will handle mostly just the mod_perl requests once the cache is primed. This reduces the number of HTTPD processes and thus reduces the amount of memory used. To set this up, just install the current version of Squid (at this writing, this is version 1.1.22) and use the RunAccel script to start it. You will need to reconfigure your HTTPD to use an alternate port, such as 8042, rather than its default port 80. To do this, you can either change the F<httpd.conf> line C<Port> or add a C<Listen> directive to match the port specified in the F<squid.conf> file. Your URLs do not need to change. The benefit of using the C<Listen> directive is that redirected URLs will still use the default port 80 rather than your alternate port, which might reveal your real server location to the outside world and bypass the accelerator. In the F<squid.conf> file, you will probably want to add C<programs> and C<perl> to the C<cache_stoplist> parameter so that these are always passed through to the HTTPD server under the assumption that they always produce different results. This is very similar to the two port, ProxyPass version above, but the Squid cache may be more flexible to fine tune for dynamic documents that do not change on every view. The Squid proxy server also seems to be more stable and robust than the Apache 1.2.4 proxy module. One drawback to using this accelerator is that the logfiles will always report access from IP address 127.0.0.1, which is the local host loopback address. Also, any access permissions or other user tracking that requires the remote IP address will always see the local address. The following code uses a feature of recent mod_perl versions (tested with mod_perl 1.16 and Apache 1.3.3) to trick Apache into logging the real client address and giving that information to mod_perl programs for their purposes. First, in your F<startup.perl> file add the following code: use Apache::Constants qw(OK); sub My::SquidRemoteAddr ($) { my $r = shift; if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) { $r->connection->remote_ip($ip); } return OK; } Next, add this to your F<httpd.conf> file: PerlPostReadRequestHandler My::SquidRemoteAddr This will cause every request to have its C<remote_ip> address overridden by the value set in the C<X-Forwarded-For> header added by Squid. Note that if you have multiple proxies between the client and the server, you want the IP address of the last machine before your accelerator. This will be the right-most address in the X-Forwarded-For header (assuming the other proxies append their addresses to this same header, like Squid does.) If you use apache with mod_proxy at your frontend, you can use Ask Bjørn Hansen's mod_proxy_add_forward module from ftp://ftp.netcetera.dk/pub/apache/ to make it insert the C<X-Forwarded-For> header. =head1 SUMMARY To gain maximal performance of mod_perl on a busy site, one must reduce the amount of resources used by the HTTPD to fit within what the machine has available. The best way to do this is to reduce memory usage. If your mod_perl requests are fewer than your static page requests, then splitting the servers into mod_perl and non-mod_perl versions further allows you to tune the amount of resources used by each type of request. Using the C<ProxyPass> directive allows these multiple servers to appear as one to the users. Using the Squid accelerator also achieves this effect, but Squid takes care of deciding when to acccess the large server automatically. If all of your requests require processing by mod_perl, then the only thing you can really do is throw a I<lot> of memory on your machine and try to tweak the perl code to be as small and lean as possible, and to share the virtual memory pages by pre-loading the code. =head1 AUTHOR This document is written by Vivek Khera. If you need to contact me, just send email to the mod_perl mailing list. This document is copyright (c) 1997-1998 by Vivek Khera. If you have contributions for this document, please post them to the mailing list. Perl POD format is best, but plain text will do, too. If you need assistance, contact the mod_perl mailing list at modperl@apache.org first (send 'subscribe' to [EMAIL PROTECTED] to subscribe). There are lots of people there that can help. Also, check the web pages http://perl.apache.org/ and http://www.apache.org/ for explanations of the configuration options. $Revision: 1.1 $ $Date: 2002/01/05 19:20:01 $ 1.1 modperl-docs/src/docs/1.0/faqs/perl_myth.pod Index: perl_myth.pod =================================================================== =head1 Popular Perl Complaints and Myths =head2 Abbreviations =over 4 =item * M = Misconception or Myth =item * R = Response =back =head2 Interpreted vs. Compiled =over 4 =item M: Each dynamic perl page hit needs to load the Perl interpreter and compile the script, then run it each time a dynamic web page is hit. This dramatically decreases performance as well as makes Perl an unscalable model since so much overhead is required to search each page. =item R: This myth was true years ago before the advent of mod_perl. mod_perl loads the interpreter once into memory and never needs to load it again. Each perl program is only compiled once. The compiled version is then kept into memory and used each time the program is run. In this way there is no extra overhead when hitting a mod_perl page. =back =head3 Interpreted vs. Compiled (More Gory Details) =over 4 =item R: Compiled code always has the potential to be faster than interpreted code. Ultimately, all interpreted code needs to eventually be converted to native instructions at some point, and this is invariably has to be done by a compiled application. That said, an interpreted language CAN be faster than a comprable native application in certain situations, given certain, common programming practices. For example, the allocation and de-allocation of memory can be a relatively expensive process in a tightly scoped compiled language, wheras interpreted languages typically use garbage collectors which don't need to do expensive deallocation in a tight loop, instead waiting until additional memory is absolutely necessary, or for a less computationally intensive period. Of course, using a garbage collector in C would eliminate this edge in this situation, but where using garbage collectors in C is uncommon, Perl and most other interpreted languages have built-in garbage collectors. It is also important to point out that few people use the full potential of their modern CPU with a single application. Modern CPUs are not only more than fast enough to run interpreted code, many processors include instruction sets designed to increase the performance of interpreted code. =back =head2 Perl is overly memory intensive making it unscalable =over 4 =item M: Each child process needs the Perl interpreter and all code in memory. Even with mod_perl httpd processes tend to be overly large, slowing performance, and requiring much more hardware. =item R: In mod_perl the interpreter is loaded into the parent process and shared between the children. Also, when scripts are loaded into the parent and the parent forks a child httpd process, that child shares those scripts with the parent. So while the child may take 6MB of memory, 5MB of that might be shared meaning it only really uses 1MB per child. Even 5 MB of memory per child is not uncommon for most web applications on other languages. Also, most modern operating systems support the concept of shared libraries. Perl can be compiled as a shared library, enabling the bulk of the perl interpreter to be shared between processes. Some executable formats on some platforms (I believe ELF is one such format) are able to share entire executable TEXT segments between unrelated processes. =back =head3 More Tuning Advice: =over 4 =item * B<Vivek Khera's mod_perl performance tuning guide>( http://perl.apache.org/tuning/ ) =item * B<Stas Bekman's Performance Guide>( http://perl.apache.org/guide/performance.html ) =back =head2 Not enough support, or tools to develop with Perl. (Myth) =over 4 =item R: Of all web applications and languages, Perl arguable has the most support and tools. B<CPAN> is a central repository of Perl modules which are freely downloadable and usually well supported. There are literally thousands of modules which make building web apps in Perl much easier. There are also countless mailing lists of extremely responsive Perl experts who usually respond to questions within an hour. There are also a number of Perl development environments to make building Perl Web applications easier. Just to name a few, there is C<Apache::ASP>, C<Mason>, C<embPerl>, C<ePerl>, etc... =back =head2 If Perl scales so well, how come no large sites use it? (myth) =over 4 =item R: Actually, many large sites DO use Perl for the bulk of their web applications. Here are some, just as an example: B<e-Toys>, B<CitySearch>, B<Internet Movie Database>( http://imdb.com ), B<Value Click> ( http://valueclick.com ), B<Paramount Digital Entertainment>, B<CMP> ( http://cmpnet.com ), B<HotBot Mail>/B<HotBot Homepages>, and B<DejaNews> to name a few. Even B<Microsoft> has taken interest in Perl, ( http://www.activestate.com/press/releases/Microsoft.htm ). =back =head2 Perl even with mod_perl, is always slower then C. =over 4 =item R: The Perl engine is written in C. There is no point arguing that Perl is faster than C because anything written in Perl could obviously be re-written in C. The same holds true for arguing that C is faster than assembly. There are two issues to consider here. First of all, many times a web application written in Perl B<CAN be faster> than C thanks to the low level optimizations in the Perl compiler. In other words, its easier to write poorly written C then well written Perl. Secondly its important to weigh all factors when choosing a language to build a web application in. Time to market is often one of the highest priorities in creating a web application. Development in Perl can often be twice as fast as in C. This is mostly due to the differences in the language themselves as well as the wealth of free examples and modules which speed development significantly. Perl's speedy development time can be a huge competitive advantage. =back =head2 Java does away with the need for Perl. =over 4 =item M: Perl had its place in the past, but now there's Java and Java will kill Perl. =item R: Java and Perl are actually more complimentary languages then competitive. Its widely accepted that server side Java solutions such as C<JServ>, C<JSP> and C<JRUN>, are far slower then mod_perl solutions (see next myth). Even so, Java is often used as the front end for server side Perl applications. Unlike Perl, with Java you can create advanced client side applications. Combined with the strength of server side Perl these client side Java applications can be made very powerful. =back =head2 Perl can't create advanced client side applications =over 4 =item R: True. There are some client side Perl solutions like PerlScript in MSIE 5.0, but all client side Perl requires the user to have the Perl interpreter on their local machine. Most users do not have a Perl interpreter on their local machine. Most Perl programmers who need to create an advanced client side application use Java as their client side programming language and Perl as the server side solution. =back =head2 ASP makes Perl obsolete as a web programming language. =over 4 =item M: With Perl you have to write individual programs for each set of pages. With ASP you can write simple code directly within HTML pages. ASP is the Perl killer. =item R: There are many solutions which allow you to embed Perl in web pages just like ASP. In fact, you can actually use Perl IN ASP pages with PerlScript. Other solutions include: C<Mason>, C<Apache::ASP>, C<ePerl>, C<embPerl> and C<XPP>. Also, Microsoft and ActiveState have worked very hard to make Perl run equally well on NT as Unix. You can even create COM modules in Perl that can be used from within ASP pages. Some other advantages Perl has over ASP: mod_perl is usually much faster then ASP, Perl has much more example code and full programs which are freely downloadable, and Perl is cross platform, able to run on Solaris, Linux, SCO, Digital Unix, Unix V, AIX, OS2, VMS MacOS, Win95-98 and NT to name a few. Also, Benchmarks show that embedded Perl solutions outperform ASP/VB on IIS by several orders of magnitude. Perl is a much easier language for some to learn, especially those with a background in C or C++. =back =head1 CREDITS Thanks to the mod_perl list for all of the good information and criticism. I'd especially like to thank, =over 4 =item * Stas Bekman E<lt>[EMAIL PROTECTED]<gt> =item * Thornton Prime E<lt>[EMAIL PROTECTED]<gt> =item * Chip Turner E<lt>[EMAIL PROTECTED]<gt> =item * Clinton E<lt>[EMAIL PROTECTED]<gt> =item * Joshua Chamas E<lt>[EMAIL PROTECTED]<gt> =item * John Edstrom E<lt>[EMAIL PROTECTED]<gt> =item * Rasmus Lerdorf E<lt>[EMAIL PROTECTED]<gt> =item * Nedim Cholich E<lt>[EMAIL PROTECTED]<gt> =item * Mike Perry E<lt> http://www.icorp.net/icorp/feedback.htm E<gt> =item * Finally, I'd like to thank Robert Santos E<lt>[EMAIL PROTECTED]<gt>, CyberNation's lead Business Development guy for inspiring this document. =back =head1 AUTHOR Adam Pisoni =head2 Contact info email: [EMAIL PROTECTED] WWW: http://sm.pm.org/ WWW: http://www.cnation.com =head1 VERSION Ver 1.04 Tue Aug 5 9:45:00 PST 1999 =cut 1.1 modperl-docs/src/docs/1.0/win32/config.cfg Index: config.cfg =================================================================== use vars qw(@c); @c = ( id => 'win32', title => "mod_perl on Win32", abstract => 'Various documents assisting mod_perl users on Win32 platforms', chapters => [ qw( win32_binaries.pod win32_compile.pod win32_multithread.pod ), ], ); 1.1 modperl-docs/src/docs/1.0/win32/win32_binaries.pod Index: win32_binaries.pod =================================================================== =head1 NAME win32_binaries - obtaining Apache mod_perl-1.xx binaries for Win32 =head1 DESCRIPTION This document discusses the two major types of binary packages available for Win32 mod_perl - all-in-one Perl/Apache/mod_perl binaries, and mod_perl ppm (Perl Package Manager) packages. =head1 ALL-IN-ONE PACKAGES There are at least two binary packages for Win32 that contain the necessary Perl and Apache binaries: http://www.indigostar.com/ ftp://theoryx5.uwinnipeg.ca/pub/other/perl-win32-bin-x.x.exe As well as including a number of non-core modules, both of these packages contain mod_perl. See the documentation on the web sites and that included with the packages for installation instructions. Both of these also include an ActiveState-compatible C<ppm> (Perl Package Manager) utility for adding and upgrading modules. =head1 PPM Packages For users of ActivePerl, available from http://www.activestate.com/ there are also C<PPM> mod_perl packages available. For this, if you don't already have it, get and install the latest Win32 Apache binary from http://httpd.apache.org/ Both ActivePerl and Apache binaries are available as C<MSI> files for use by the Microsoft Installer - as discussed on the ActiveState site, users of Windows 95 and 98 may need to obtain this. In installing these packages, you may find it convenient when transcribing any Unix-oriented documentation to choose installation directories that do not have spaces in their names (eg, F<C:\Perl> and F<C:\Apache>). After installing Perl and Apache, you can then install mod_perl via the PPM utility. ActiveState does not maintain mod_perl in the ppm repository, so you must get it from a different location other than ActiveState's site. One way is simply as (broken over two lines for readability) C:\> ppm install http://theoryx5.uwinnipeg.ca/ppmpackages/mod_perl.ppd Another way, which will be useful if you plan on installing additional Apache modules, is to set the repository within the C<ppm> shell utility as (the C<set repository ...> command has been broken over two lines for readability): C:\> ppm PPM> set repository theoryx5 http://theoryx5.uwinnipeg.ca/cgi-bin/ppmserver?urn:/PPMServer PPM> install mod_perl PPM> set save PPM> quit C:\> The C<set save> command saves the C<theoryx5> repository to your PPM configuration file, so that future PPM sessions will search this repository, as well as ActiveState's, for requested packages. The mod_perl PPM package also includes the necessary Apache DLL C<mod_perl.so>; a post-installation script should be run which will offer to copy this file to your Apache modules directory (eg, F<C:\Apache\modules>). Note that the mod_perl package available from this site will always use the latest mod_perl sources compiled against the latest official Apache release; depending on changes made in Apache, you may or may not be able to use an earlier Apache binary. However, in the Apache Win32 world it is particularly a good idea to use the latest version, for bug and security fixes. =head1 CONFIGURATION Add this line to F<C:\Apache\conf\httpd.conf>: LoadModule perl_module modules/mod_perl.so Be sure that the path to your Perl binary (eg, F<C:\Perl\bin>) is in your C<PATH> environment variable. =head2 Registry scripts Using C<Apache::Registry> to speed up cgi scripts may be done as follows. Create a directory, for example, F<C:\Apache\mod_perl>, which will hold your scripts. Insert then in F<C:\Apache\conf\httpd.conf> the following directives: Alias /mod_perl/ "/Apache/mod_perl/" <Location /mod_perl> SetHandler perl-script PerlHandler Apache::Registry Options +ExecCGI PerlSendHeader On </Location> whereby the script would be called as http://localhost/mod_perl/name_of_script =head2 Hello World As you will discover, there is much to mod_perl beyond simple speed-up of cgi scripts. Here is a simple I<Hello, World> example that illustrates the use of mod_perl as a content handler. Create a file F<Hello.pm> as follows: package Apache::Hello; use strict; use Apache::Constants qw(OK); sub handler { my $r = shift; $r->send_http_header; $r->print("<html><body>Hello World!</body></html>\n"); return OK; } 1; and save it in, for example, the F<C:\Perl\site\lib\Apache\> directory. Next put the following directives in F<C:\Apache\conf\httpd.conf>: PerlModule Apache::Hello <Location /hello> SetHandler perl-script PerlHandler Apache::Hello </Location> With this, calls to http://localhost/hello will use C<Apache::Hello> to deliver the content. =head1 APACHE MODULES The C<theorxy5> repository containing the mod_perl ppm package also contains a number of other Apache modules, such as C<Apache::ASP>, C<HTML::Embperl>, and C<HTML::Mason>. However, there may be ones you find that are not available through a repository; in such cases, you might try sending a message to the maintainer of the repository asking if a particular package could be included, or you could use the C<CPAN.pm> module to fetch, build, and install the module - see C<perldoc CPAN> for details. =head1 SEE ALSO L<mod_perl>, L<Apache>, http://perl.apache.org/, especially the guide, http://take23.org/, http://httpd.apache.org/, and http://www.activestate.com/. =cut 1.1 modperl-docs/src/docs/1.0/win32/win32_compile.pod Index: win32_compile.pod =================================================================== =head1 NAME win32_compile - Apache mod_perl-1.xx installation instructions for Win32 =head1 DESCRIPTION This document discusses how to build, test, configure and install mod_perl under Win32. =head1 PREREQUISITES =over 3 patience - mod_perl is considered alpha under Win32. MSVC++ 5.0+, Apache version 1.3-dev or higher and Perl 5.004_02 or higher. As of version 1.24_01, mod_perl will build on Win32 ActivePerls based on Perl-5.6.x (builds 6xx). For binary compatibility you should use the same compiler in building mod_perl that was used to compile your Perl binary; for ActivePerl, this means using VC++ 6. =back =head1 BUILDING Obtain the mod_perl sources from CPAN: http://www.cpan.org/authors/id/D/DO/DOUGM/mod_perl-1.xx.tar.gz When unpacked, using Winzip or similar tools, a subdirectory F<mod_perl-1.xx> will be created. There are two ways to build mod_perl - with MS Developer Studio, and through command-line arguments to 'perl Makefile.PL'. In both cases Apache should previously have been built and installed - if you are using a binary build of Apache, make sure that you obtain a binary build that includes the Apache libraries and header files. =head2 Building with MS Developer Studio =over 3 =item Setup the Perl side Run, from a DOS window in the top-level directory of the mod_perl sources, perl Makefile.PL nmake This will set up the Perl side of mod_perl for the library build. =item Build mod_perl.so Using MS developer studio, select "File -> Open Workspace ...", select "Files of type [Projects (*.dsp)]" open mod_perl-x.xx/src/modules/win32/mod_perl.dsp =item Settings select "Tools -> Options -> [Directories]" select "Show directories for: [Include files]", and add C:\Apache\include . (should expand to C:\...\mod_perl-x.xx\src\modules\perl) C:\Perl\lib\Core select "Project -> Add to Project -> Files", adding: perl.lib (or perl56.lib) (e.g. C:\perl\lib\Core\perl.lib) ApacheCore.lib (e.g. C:\Apache\ApacheCore.lib) select "Build -> Set Active Configuration -> [mod_perl - Win32 Release]" select "Build -> Build mod_perl.so" You may see some harmless warnings, which can be reduced (along with the size of the DLL), by setting: "Project -> Settings -> [C/C++] -> Category: [Code Generation] -> Use runtime library: [Multithreaded DLL] =item Testing Once mod_perl.so is built you may test mod_perl with: nmake test after which, assuming the tests are OK, nmake install will install the Perl side of mod_perl. The mod_perl.so file built under F<mod_perl-1.xx/src/modules/win32/Release> should be copied to your Apache modules directory (eg, F<C:\Apache\modules>). =back =head2 Building with arguments to C<perl Makefile.PL> Generating the Makefile as, for example, perl Makefile.PL APACHE_SRC=\Apache INSTALL_DLL=\Apache\modules will build mod_perl (including mod_perl.so) entirely from the command line. The arguments accepted include =over 3 =item APACHE_SRC This can be one of two values: either the path to the Apache build directory (eg, F<..\apache_1.3.xx>), or to the installed Apache location (eg, F<\Apache>). This is used to set the locations of ApacheCore.lib and the Apache header files. =item INSTALL_DLL This gives the location of where to install mod_perl.so (eg, F<\Apache\modules>). No default is assumed - if this argument is not given, mod_perl.so must be copied manually. =item DEBUG If true (DEBUG=1), a Debug version will be built (this assumes that a Debug Apache has been built). If false, or not given, a Release version will be built. =item EAPI If true (EAPI=1), EAPI (Extended API) will be defined when compiling. This is useful when building mod_perl against mod_ssl patched Apache sources. If false, or not given, EAPI will not be defined. =back After this, running nmake nmake test nmake install will complete the installation. This latter method of building mod_perl will also install the Apache and mod_perl header files, which can then be accessed through the Apache::src module. =head1 CONFIGURATION Add this line to F<C:\Apache\conf\httpd.conf>: LoadModule perl_module modules/mod_perl.so Be sure that the path to your Perl binary (eg, F<C:\Perl\bin>) is in your C<PATH> environment variable. =head1 SEE ALSO L<mod_perl>, L<Apache>, http://perl.apache.org/, especially the guide, and http://take23.org/. =cut 1.1 modperl-docs/src/docs/1.0/win32/win32_multithread.pod Index: win32_multithread.pod =================================================================== =head1 NAME win32_multithread - discussion of multithreading on Win32 mod_perl-1.xx =head1 DESCRIPTION This document discusses the multithreading limitations of mod_perl-1.xx on Win32. =head1 The problem On Win32, mod_perl is effectively single threaded. What this means is that a single instance of the interpreter is created, and this is then protected by a server-wide lock that prevents more than one thread from using the interpreter at any one time. The fact that this will prevent parallel processing of requests, including static requests, can have serious implications for production servers that often must handle concurrent or long-running requests. This situation will change with Apache/mod_perl 2.0, which is based on a multi-process/multi-thread approach using a native Win32 threads implementation. See http://perl.apache.org/~dougm/modperl_2.0.html for details. At the time of writing, Apache-2.0 is in a beta stage of development. mod_perl-2.0 is being actively developed, including the Win32 port; if you would like a preview and/or would like to contribute to the development process, see the documents on obtaining mod_perl-2.0 by cvs, which can be obtained from mod_perl's home page at http://perl.apache.org/. =head1 Does it really matter? How serious is this? For some people and application classes it may be a non-problem, assuming the static material issue is handled differently. Low traffic and single user development sites will likely be unaffected (though the lattest are likely to experience some surprises when moving to an environment where requests are no longer serialized and concurrency kicks in). If your application is CPU bound, and all requests take roughly the same time to complete, then having more processing threads than processors (CPUs) will actually slow things down, because of the context switching overhead. Note that, even in this case, the current state of mod_perl will bar owners of multiprocessor Win32 machines from gaining any load balancing advantage from their superior hardware. On the other hand, applications dealing with a large service times spread - say ranging from fractions of a second to a minute and above - stand to lose a great deal of responsiveness from being single threaded. The reason is that short requests that happen to be queueued after long ones will be delayed for the entire duration of the "jobs" that precede them in the queue; with multitasking they would get a chance to complete much earlier. =head1 Workarounds If you need multithreading on Win32, either because your application has long running requests, or because you can afford multiprocessor hardware, and assuming you cannot switch operating systems, you may want to consider a few workarounds and/or alternatives - which do not require waiting for 2.0. You may be able to make Win32 multithreading a non-issue by tuning or rearranging your application and your architecture (useful tips on both counts can be found elsewhere in this document). You may be able to significantly reduce your worst-case timing problems or you may find that you can move the webserver to a more mod_perl friendly operating system by using a multi-tier scheme. If your application needs the full power of the Apache modules (often the case for people running outside Apache::Registry) you may want to consider a multi-server load balancing setup which uses mod_rewrite (or a similar URL partitioning scheme) to spread requests to several web servers, listening on different ports. The mod_proxy dual server setup, discussed in the "Strategy" section, is also a possibility, although people who have tried it have reported problems with Win32 mod_proxy. If you code to Apache::Registry (writing CGI compliant code) and can characterize the time demanded by a request from its URL, you can use a rewrite-based load balancing with a single server, by sending short requests to mod_perl while routing longer ones to the pure CGI environment - on the basis that startup, compilation and init times will matter less in this case. If none of the above works for you, then you will have to turn to some non mod_perl alternatives: this, however, implies giving up on most of the flexibility of the Apache modules. For CGI compliant scripts, two possible (portable) alternatives which are supported in an Apache/perl environment are straight CGI and FastCGI. In theory a CGI application that runs under mod_perl should have very few or no problems to run under straight CGI (though its performance may be unacceptable). A FastCGI port should also be relatively painless. However, as always, your mileage may vary. If you do not mind replacing Apache with IIS/PWS, you may want to experiment with ActiveState's value added PerlEx extension, which speeds up CGI scripts much in a way similar to what FastCGI does. PerlEx is transparently supported by CGI.pm, so users of this package should be more or less covered. (A IIS-FastCGI accelerator is, regrettably, no longer available.) =head1 SEE ALSO http://perl.apache.org and http://httpd.apache.org, especially the discussion of Apache-2 and modperl-2. =cut
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]