Sorry I forgot to include these files with the earlier message.
Does anyone have any clues with this problem.
I have searched the archives for internal server errors and
PerlFixupHandlers
and found no similar problems. Perhaps this is a mod_dir issue.
Judy
judy selen wrote:
>
> We are using
> RedHat Linux apache 1.3.14 mod_perl_1.23_01
>
> Our site is set up to generate pages using a templating system to add
> custom headers footers and sidebars. As well we use .htaccess files to
> further customize the subprocess_env.
>
> We are using custom modules installed as PerlFixupHandlers both for
> template generation and multilingual negotiation
>
> Negotiation.pm handles lingual content negotiation. It determines
> what version of the page to present based on the users preferred
> variants and the available variants for that page.When the best match
> is found, it
> performs an internal redirect to that document.
>
> Template.pm is installed as a fixup handler, and is controlled through
> various PerlSetVar directives.
> Basically, it determines what to include at the beginning and end of
> the web page.
> fixup sets the content handler for the request.
> Also has two subroutines, handle_rec for static pages ,and handle_cgi
> for scripts These allow the auto-inclusion of headers and footers of web
> pages.
>
> In httpd.conf
> --------------
> DirectoryIndex index.cgi index.html
>
> PerlRequire /www/perl/libs/WRI/Template.pm
> PerlRequire /www/perl/libs/WRI/Negotiation.pm
>
> PerlFixupHandler WRI::Negotiation WRI::Template::fixup
> ---------------------------------------------------
>
> These modules work 90% of the time....
>
> The problem occurs when a directory is requested, and served as a
> negotiated document
> and is passed to our content handler. (handle_cgi)
> A save-as window pops up in the browser and Apache logs a 500
> error. (The file it is trying to save is the standard 500 internal
> error message.)
> In the Template::fixup handler, the
> document is being detected as a cgi-script, not a server-parsed
> document. The 500 occurs because we are trying to execute a static
> document or directory.
>
> Template.pm
> In run_cgi:
>
> unless (open(CGI,'-|')) {
> #don't need the writing handle to the pipe anymore
> close(WR);
>
> #make STDIN be the read handle of the pipe (so we can get
> #POST information from Apache)
> open(STDIN,'<&RD');
>
> #Run in the directory of the file
> $r->chdir_file($filename);
>
> ### This returns 500 when a directory or static file is negotiated ###
>
> #Run the file
> exec($filename) || return SERVER_ERROR;
> }
>
>
> Error Log:
>
> [Wed Jan 31 13:04:52 2001] [error] Can't use string ("500") as a symbol
> ref while "strict refs" in use at /www/perl/libs/WRI/Template.pm line
> 302.
>
> [Wed Jan 31 13:04:55 2001] [error] Usage: Apache::cgi_header_out(r,
> key,
> ...) at /www/perl/libs/WRI/Template.pm line 433.
>
> These lines numbers may not be accurate since we have added and
> deleted
> debugging code..
>
>
> Template.pm - handle_cgi:
>
> #FIXME - needs error checking. If run CGI fails, $cgi will have
> # the number 500 instead of a file handle, and will fail
> later on
>
> my $cgi = run_cgi($r, $r->filename(), $r->uri());
>
> line 302:
> while (<$cgi>) {
>
> #remove the trailing body and html tags from the CGI output
> #if we are using includes
> if ($addincs) {
>
> s!</(body|html)[^>]*>!!gio;
> }
>
> #print out the line that we just read
> $r->print($_);
> }
>
> .
> .
> .
>
> do {
> #Quit if this is the end of the headers
> last if ($_ eq $/);
>
> #remove the LF or CRLF
> chomp;
>
> #Give apache the header line
> unless ($is_sub) {
> line 433: $r->cgi_header_out(split(/:\s*/,$_,2));
> }
> } while (<CGI>);
>
>
> ---------------------------------------------
>
> My understanding is that when OK or DECLINED is returned from these
> modules in the list of fixup handlers, control passes to the next,
> and when DONE is returned, the remaining are
> skipped mod_dir regains control and the next default file is requested
> (index.cgi index.html)
>
> Is this correct or is there something I am misunderstanding.
>
> Request Outline
> ---------------
> mod_dir sends directory request to Negotiation.pm
>
> /products/student/calcwiz/
> returns OK from Nego.pm
> mod_dir then sends index.cgi - index.html
> Nego.pm negotiates index.cgi and each of available variations
> then index.html ......
>
> when found most appropriate
> Negotiating:
> /products/student/calcwiz/index.en.html
>
> Template.pm
> handling request..
> /products/student/calcwiz/index.en.html
>
> Another Clue
> This error recurs when the page is reloaded 25 - 30 times
> MaxRequestsPerChild is 30
> Could this be related to our problem.
>
> I appreciate any clues anyone may have to why this problem occurs.
> Thanks in advance,
> Judy
package WRI::Template;
use strict;
use Apache::Constants qw(:common :methods);
use Apache::Log;
use Apache::File;
#PURPOSE: This package allows the auto-inclusion of headers and footers
# of web pages. It is installed as a fixup handler, and is
# controlled through various PerlSetVar directives
#
#CONFIGURATION: This module uses several PerlSetVar directives for
# configuration. They are:
# PerlSetVar AutoIncludes on
# PerlSetVar IncludeHeader filename
# PerlSetVar IncludeFooter filename
# PerlSetVar handle:handlername
# PerlSetVar NoCache 1
#
# These directives are described in the POD document at
# the end of this file
#
# Also, a -head file is used during the inclusion process
#
#BASIC OUTLINE: Normally, you would think that this type of module
# would be installed as a Content Handler. However,
# then it would have to handle all the requests _and_
# subrequests made (for the header, body, and footer).
# As a fixup handler, it can simply check to see if it
# is the main request and, if so, execute, otherwise
# it will pass control on to the modules that normally
# handle such requests.
#
# If it detects that this is the main request, and that
# includes are turned on. Then it checks
# what the content handler was guessed to be. It then
# checks the PerlSetVar handle: directives, to see if
# it is supposed to handle requests of this type.
# If so, it loads its own content handler instead of the
# one Apache thought it should use. It loads a different
# handler for CGI scripts than for static pages.
#
#FIXME - maybe whether or not includes are used should be based on the
# content type and not necessarily the handler. Just a thought
#
my $DEBUG = 0;
my $handlers = {'cgi-script' => \&handle_cgi,
'server-parsed' => \&handle_req,
'*' => \&handle_cgi};
sub fixup {
my $r = shift;
#check to see that includes are turned on, this is the main request,
#and we are supposed to handle requests of this type
unless ($r->is_main() &&
lc($r->subprocess_env('AutoIncludes')) eq 'on' &&
$r->subprocess_env('handle:'.$r->handler()) &&
!($r->notes->{'NoTemplate'})) {
return DECLINED;
}
#FIXME - I don't think this is used anywhere anymore. This is
# now accomplished by the EnvFixup module. This is used
# to tell what the main file is being generated.
$r->subprocess_env('MAIN_SCRIPT_FILENAME', $r->filename());
#Set the content handler to be this handler. We have two handlers,
#a CGI one and one for everything else. CGI's have to be handled
#differently because they can have different titles and meta tags
#depending on how they are executed, while HTML pages can simply have
#their titles read directly from a file
#Push handlers causes lots of problems in other things (most notably
#Apache::Registry scripts). That's why we use set_handlers.
# This stuff below allows us to use a 'handle-as:support-faq server-parsed'
# sort of thing (means handle the 'support-faq' handler as the equivalent to
# what the template module would do with server-parsed content.
my $k;
foreach $k (keys %{$r->subprocess_env})
{
my $handler;
next
unless ($handler) = $k =~ /^handle-as:(.+)/;
$handlers->{$handler} = $handlers->{$r->subprocess_env($k)};
}
$r->set_handlers(PerlHandler => [$handlers->{$r->handler()}
|| $handlers->{'*'}]);
$r->handler('perl-script');
#delete the environment at the end of the request (this may happen
#anyway)
$r->register_cleanup(sub { undef(%ENV); });
return OK;
}
#PURPOSE: This subroutine handles the template module for static pages
#
sub handle_req {
my $r = shift;
my $log = $r->log();
#Make sure the file exists. If not, log an error, and return NOT FOUND
if (! (-e $r->filename())) {
$log->error("Url does not exist: ".$r->uri());
return NOT_FOUND;
}
#FIXME -
#We need to disallow appending any PATH_INFO to regular
#requests because it messes up URLs a lot. However, sometimes
#valid URLs ending in a "/" get that slash added to
#$r->path_info() (not sure why). However, we need to do something
#(maybe redirect?) to keep people from putting in URLs with
#extra slashes, because it messes LOTS of things up.
#get subrequests without running them, and update the timestamp
#in case it's a HEAD request
my $subr_header =
include(
$r,
$r->subprocess_env('IncludeHeader'),
q(NO_RUN)
);
$r->update_mtime((stat $subr_header->filename())[9]);
my $subr_footer =
include(
$r,
$r->subprocess_env('IncludeFooter'),
q(NO_RUN)
);
$r->update_mtime((stat $subr_footer->filename())[9]);
#update the timestamp with the current file name
$r->update_mtime((stat $r->filename())[9]);
#Set last modified and cacheing(ETAG) information unless NoCache is set.
#NoCache is used for our pages with Server Side Includes that change a lot
unless ($r->subprocess_env('NoCache'))
{
$r->set_last_modified();
$r->set_etag();
}
#Not sure what this does... Chris added it. I think it has something
#to do with the HTTP/1.1 protocol, but not sure (like, for a
#get-if-not-modified-since or whatever that is)
if ((my $rc = $r->meets_conditions()) != OK) {
return $rc;
}
#send the header
my $charset = $r->subprocess_env('WRI_CHARSET');
$r->send_http_header('text/html'
. ($charset ? "; charset=$charset" : ''));
#we're done if they only asked for the header
return OK if ($r->header_only());
#Print the page introduction
print_head_html($r);
#Read all data that goes between the <head></head> tags from
#the -head file
my $head = join('-', $r->filename(), 'head');
if (open(FH, $head)) {
$r->send_fd(\*FH);
close(FH);
} else {
$r->print("<title>Wolfram Research, Inc.</title>\n");
}
#end the header unless configuration requests otherwise
$r->print("\n</head>\n") unless $r->subprocess_env('KeepHeadOpen');
#run the header
$subr_header->run();
#run the body
include($r, $r->uri());
#run the footer
$subr_footer->run();
#FIXME - should this be here or in the templates? Probably here
# because we tack on the beginning <html> tag
$r->print("\n</html>\n");
return OK;
}
#PURPOSE: This handles execution of CGI scripts under the template module.
# More work may have to be done to get this to work in C because
# mod_perl handles a lot of the environment handling tasks for
# us here.
sub handle_cgi {
my $r = shift;
my $buf; #buffer for reading in HTML header
my $log = $r->log();
#Return NOT FOUND if the CGI script does not exist
unless (-e $r->filename()) {
$log->error("Url does not exist: ".$r->uri());
return NOT_FOUND;
}
#Get the CGI file handle. The run_cgi procedure does a lot of the
#grunt work for handling requests (forking, sending headers,
#transferring POST data, etc)
#
#FIXME - needs error checking. If run CGI fails, $cgi will have
# the number 500 instead of a file handle, and will fail later on
my $cgi = run_cgi($r, $r->filename(), $r->uri());
#only add includes
# if the CGI execution is successful
# and the CGI is spitting out HTML
my $addincs = (($r->status >= 200) && ($r->status < 300) &&
($r->content_type() eq 'text/html'));
# Browsers suck because they don't follow the spec
# and so setting Expires: $x Date: $x does not work because
# the browser does not listen to Date: when deciding when Expires:
# occurs (cknite)
# UPDATE - Actually, I think it is only supposed to work if the browser
# identifies itself as HTTP/1.1. Maybe we could re-enable this
# for HTTP/1.1 browsers (johnnyb)
# $r->no_cache(1);
#send the header
$r->send_http_header();
#we're done if they only want the header
return OK if ($r->header_only());
#If we are using includes, print out the beginning HTML and the <head>
#information from the CGI, and include the header
if ($addincs) {
$r->print("<html>\n<head>\n");
#look through the CGI output, and print out everything between
#<head> and </head>. This will probably be the most difficult thing
#to do in C. Also, strip out the <html>, <head>, </head>, and
#<body> tags as well. These can occur anywhere
#within a line. The <body> will be the hardest because the tag
#itself can spread multiple lines.
while (<$cgi>) {
s!<html[^>]*>!!gio;
s!</?head[^>]*>!!gio;
my $body = s!<body.*?>(.*)!!io;
$buf .= $_;
if ($body) {
$r->print($buf);
$buf = $1;
last;
}
}
#Close the <head> tag unless otherwise specified by the configuration
$r->print("\n</head>\n") unless $r->subprocess_env('KeepHeadOpen');
include($r, $r->subprocess_env('IncludeHeader'));
#print the rest of the buffer read from the CGI (we do it now,
#because it should only come after the includes. This will print
#everything after the body tag that lies on the same or following
#line of the body tag
$r->print($buf) if ($buf);
}
#Print out the contents of the CGI
while (<$cgi>) {
#remove the trailing body and html tags from the CGI output
#if we are using includes
if ($addincs) {
s!</(body|html)[^>]*>!!gio;
}
#print out the line that we just read
$r->print($_);
}
#If we are using includes, put in the footer, and print the ending
#</html>
if ($addincs) {
include($r, $r->subprocess_env('IncludeFooter'));
$r->print("</html>\n");
}
#We're done!
return OK;
}
#PURPOSE: Opens a filehandle to a CGI script, also parses the CGI headers
# into the request object
#
#NOTE: In C this will have to use many functions from util_script.c
# in the apache source distribution
#
sub run_cgi {
my $r = shift;
my(
$filename, #name of the executable
$uri, #URI requested by the browser
$is_sub #not sure what this is for. It is always sent as FALSE
) = @_;
local($|);
$| = 1;
#Fix mod_perl's junk (mod_perl inappropriately sets the
#GATEWAY_INTERFACE on non-Apache::Registry scripts.
my $old_gw = $ENV{GATEWAY_INTERFACE};
$ENV{GATEWAY_INTERFACE} = 'CGI/1.1';
#Open up a pipe. This will be used to give input to the CGI
pipe(RD,WR) || return SERVER_ERROR;
#This causes a fork, with the standard output of the forked process
#going to the CGI handle. The code withing the braces ({ and })
#is what is executed by the forked process. We will read from
#that process through the CGI filehandle
unless (open(CGI,'-|')) {
#don't need the writing handle to the pipe anymore
close(WR);
#make STDIN be the read handle of the pipe (so we can get
#POST information from Apache)
open(STDIN,'<&RD');
#Run in the directory of the file
$r->chdir_file($filename);
#Run the file
exec($filename) || return SERVER_ERROR;
}
$ENV{GATEWAY_INTERFACE} = $old_gw;
#be sure to close the CGI at the end. However, we probably can just
#do this within our code without having to specifically register a
#cleanup
$r->register_cleanup(sub { close(CGI) });
#The RD filehandle was for the forked process, not us, so we can close it
close(RD);
#Give the CGI any POST data it needs
#FIXME - need to do a buffered read so that Apache processes don't get
# too big
my $read_buffer;
$r->read($read_buffer, $r->header_in('Content-Length'));
print WR $read_buffer;
#we've printed everything we are going to, so we no longer need th WR
#filehandle
close(WR);
#This reads the header data from the CGI script. It has to handle
#both CRLF linebreaks and plain LF linebreaks (because Apache handles
#both)
{
#Set linebreak to LF only
local $/ = "\n";
#get the first line
$_ = <CGI>;
#Check for CRLF linebreaks
$/ = "\r\n" if (substr($_, -2, 1) eq "\r");
#Check to see if the CGI printed a status line. If so, use it
if (m!^HTTP/\S+\s+((\d+).*)!o) {
unless ($is_sub) {
$r->status_line($1);
$r->status($2);
}
#Read in the next line for processing
$_ = <CGI>;
}
do {
#Quit if this is the end of the headers
last if ($_ eq $/);
#remove the LF or CRLF
chomp;
#Give apache the header line
unless ($is_sub) {
$r->cgi_header_out(split(/:\s*/,$_,2));
}
} while (<CGI>);
}
#Return the file handle for further processing
return \*CGI;
}
#PURPOSE: This subroutine handles all the subrequests for included pages
sub include {
my $r = shift;
my(
$uri, #URI of the request
$no_run_request #set to TRUE if the request is to only be looked
#up and not run
) = @_;
#get the subrequest object from apache
my $subr = $r->lookup_uri($uri);
# Set a note to keep the template from running on the subrequest
# (this is a feeble hack to attempt to get rid of the infinite
# template bug).
$subr->notes->{'NoTemplate'} = 1;
#The subrequest doesn't get any POST data
$subr->header_in('Content-Length' => undef);
#Run the subrequest unless $no_run_request is set to TRUE
if ($no_run_request) {
return $subr;
} else {
return $subr->run();
}
}
#PURPOSE: This prints the html header. This doesn't really need to
# be a function, and should probably just be included in the main
# code
sub print_head_html {
my $r = shift;
$r->print(<<EOF);
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
EOF
}
1;
__END__
=head1 NAME
Template.pm - Module that implements the wolfram templating system
=head1 AUTHOR
Daryn Sharp - modified extensively by Jonathan Bartlett, Chris Knight, and Max Campos
=head1 SYNOPSIS
In your httpd.conf file
PerlRequire /www/perl/libs/WRI/Template.pm
PerlFixupHandler WRI::Template
This should be the last PerlFixupHandler in the chain
In your .htaccess file
PerlSetVar AutoIncludes on
PerlSetVar IncludeHeader /includes/header.cgi
PerlSetVar IncludeFooter /includes/footer.cgi
PerlSetVar handle:server-parsed yes
PerlSetVar handle:cgi-script yes
And, if you don't want your page to be cached, you can say
PerlSetVar NoCache 1
This module listens to the environment variables
=over 4
=item *
WRI_IS_ROBOT
=back
This module handles the following tasks:
=over 4
=item *
Attaches headers and footers to both CGI scripts and static or server-parsed html pages
=item *
Reformats the page to be robot-friendly if the WRI_IS_ROBOT subprocess_env variable is
set (NOTE - requires that another module set this variable) by moving headers and
footers to the end, and getting rid of images (leaves the alt tags), extra formatting
tags, and extra spaces.
=back
=head1 Implementation
=head2 Fixup Stage
This module is both a content handler and a fixup handler. However, it is only
installed as a fixup handler. In the fixup stage, it checks to see:
=over 4
=item *
Is it a main request (you don't EVER templatize subrequests, but you do handle
non-initial requests, like from a server redirect)
=item *
Are AutoIncludes turned on?
=item *
Is this module registered to handle the given content?
=back
If all of these check out, the template module registers itself as a content handler.
It picks the function to register for this task based on whether or not the request is
for a script(by checking $r->handler).
=head2 Handling Regular Pages (handle_req)
First, we issue subrequests (without running them) to find out the last modified time,
and then issue the last_modified tag of the latest time. It then reads the files
-head file, and prints the HTML header with the contents of -head. If WRI_IS_ROBOT is
set, it then prints the main body HTML after running it through the roboticize()
function to clean it up a bit. It then runs the subrequest for the header and the
footer. Then it prints a final end html tag and returns.
=head2 Handling CGI Scripts (handle_cgi)
First, this gets the filename of the script, and it sets the header in $r and returns
a filehandle to the CGI using the run_cgi() function. It then checks to see if it
should add the included files based on what the result code was (if it was a 500
internal server error, we don't want to run the includes). Also, we don't add the
includes if the content type isn't text/html.
We then send the http header, and finish if its a HEAD request. Then we read in the
CGI and parse out the head and body tags, replacing them with what we want (we use the
CGI's <title>, though). If WRI_IS_ROBOT is set, we just print a body tag, and save
the included header for later. Otherwise, we print the header here. Then we read all
of the CGI output. If WRI_IS_ROBOT is set, we roboticize it, otherwise we just print
the data. Then, we print the footer.
=head1 Supporting Functions
=head2 run_cgi
This opens a filehandle to a CGI script, parses out the headers, and sets all of the
request values based on the headers. It then returns the CGI filehandle.
=head2 roboticize
This handles all of our robot-friendly conversions, including replacing images with
just the alt text, and other similar replacements.
package WRI::Negotiation;
use strict;
use Apache::Constants qw(:common :methods :http);
my $DEBUG = 0;
my %charsets = ('ja' => 'Shift_JIS');
sub handler
{
my $r = shift;
$DEBUG = $r->dir_config('NegoDebug');
# we need to negotiate subrequests, except for when it's one of our own
# sub-requests (denoted by markingof NegoDisable).
$r->log_error("Negotiating: " . $r->uri)
if $DEBUG;
return DECLINED
if (!$r->is_main && $r->main->notes->{'NegoDisable'});
# WRI_PREFERRED_VARIANTS is always set regardless of whether we negotiate
# the request or not.
my $preferred = GetPreferredVariants($r);
$r->subprocess_env('WRI_PREFERRED_VARIANTS' =>
join(' ', @$preferred));
$r->log_error("Preferred: " . join(' ', @$preferred))
if $DEBUG;
if ($r->dir_config('NegoDisable'))
{
$r->log_error("Declining due to NegoDisable directive")
if $DEBUG;
return DECLINED;
}
# if the file exists (and is a file), then here are the possibilities:
# 1) It's as a result of this module's internal rediretion
# (in which we want to read in the variables set by the last
# request's negotiation)
# 2) It's as a result of this module's internal subrequests (used to
# determine available variants). (should not negotiate, just return)
# 3) It was a straight out request that matched a file directly
# and the file was one that would have been negotiated
# (ie. GET /index.ja.html)
# (set PREFERRED and AVAILABLE and VARIANT but do not
# actually negotiate).
# 4) It was a straight out request that matched a file directly
# and the file was one that would NOT have been negotiated
# (ie. GET /blah/index.html where index.html is a file in the
# directory) (set PREFERRED and return).
# 5) Request is a directory. Just set PREFERRED and return.
if (-e $r->finfo())
{
if (-f $r->finfo())
{
$r->log_error("Static request.")
if $DEBUG;
my $prev;
if (($prev = $r->prev) &&
$prev->notes->{'NegoRedirect'})
{
# If this is case 1, restore the variables
# do a little environment cleanup...
$r->log_error("Restoring variables from previous request.")
if $DEBUG;
delete $prev->notes->{'NegoRedirect'};
# now retrieve our variables from the previous request.
my $var = $prev->subprocess_env('WRI_VARIANT');
$r->subprocess_env('WRI_VARIANT' => $var);
$r->subprocess_env('WRI_AVAILABLE_VARIANTS' =>
$prev->subprocess_env('WRI_AVAILABLE_VARIANTS'));
$r->header_out('Content-Language', $var);
$r->header_out('Vary', 'accept-language');
$r->header_out('Content-Location', $r->uri());
$r->subprocess_env('WRI_CHARSET' => $charsets{$var})
if $charsets{$var};
}
elsif (my ($var) =
substr($r->filename, rindex($r->filename, '/') + 1)
=~ m!\.(\w+)\.!)
{
$r->log_error("case 3 for $var")
if $DEBUG;
# else if the filename appears to be something.aa.ext
# it's case 3, and we'll need to set the AVAILABLE_VARIANTS
# variable, as well as VARIANT
$r->subprocess_env('WRI_VARIANT' => $var);
my $avail = GetAvailableVariants($r, $var);
$r->subprocess_env('WRI_AVAILABLE_VARIANTS'
=> $avail->{str});
$r->header_out('Content-Language', $var);
$r->subprocess_env('WRI_CHARSET' => $charsets{$var})
if $charsets{$var};
}
# else it's case 4 and we're done.
$r->log_error("case 4 for $var")
if $DEBUG;
return OK;
}
else
{
# the file exists, but is not a regular file. It's probably
# a directory (case 5). Don't bother!
$r->log_error("Non-file request. Aborting.")
if $DEBUG;
return OK;
}
}
else
{
# file does not exist, it's possible that this is some weird virtual
# filename, or it truely needs negotiation.
# if it's a request for something.ja.ext and it doesn't exist,
# do not negotiate. This indicates a direct request of a
# specific variant, but is a 404.
return DECLINED
if substr($r->filename,
rindex($r->filename, '/') + 1)
=~ m!\.(\w+)\.!;
# return declined if it's not in the form of filename.ext
# (we do not negotiate filenames w/o extensions).
return DECLINED
if substr($r->filename, rindex($r->filename, '/')+1)
!~ /.+\..+/;
my $available = GetAvailableVariants($r);
unless ($available->{error})
{
# it truely is a negotiated document. Pick out the most
# preferred variant.
while (my $v = shift @$preferred)
{
$r->log_error("Is $v available?")
if $DEBUG;
if ($available->{var}{$v})
{
$r->log_error("Using variant $v, preparing redirect.")
if $DEBUG;
$r->subprocess_env('WRI_VARIANT' => $v);
$r->subprocess_env('WRI_AVAILABLE_VARIANTS'
=> $available->{str});
# we cannot redirect here because internal_redirect
# can only be called in the response phase.
$r->notes->{'NegoRedirectTo'} = $available->{var}{$v}{uri};
$r->set_handlers('PerlHandler' => [\&redirect_handler]);
$r->handler('perl-script');
# similarly, some things (namely mod_dir) need to
# have an actual real filename set in order to know
# that a subrequest was successful.
$r->filename($available->{path} . $available->{var}{$v}{file});
return OK;
}
$r->log_error("$v not available, get next variant")
if $DEBUG;
}
# eventually, this should return some sort of page
# that lets the user chose between the available
# variants, for now, we'll just return declined
return OK;
}
else
{
$r->log_error($available->{error});
return OK;
}
return OK;
}
}
sub redirect_handler
{
# we're in the response phase, it's probably because the current
# url was negotiated and needs an internal_redirect that can happen
# only in this phase.
my $r = shift;
if (my $to = $r->notes->{'NegoRedirectTo'})
{
$r->log_error("Redirecting...")
if $DEBUG;
# remove the temp environment variable
delete $r->notes->{'NegoRedirectTo'};
# set a new temp environment variable to get the module
# to grab the variables from the previous request.
$r->notes->{'NegoRedirect'} = 1;
$r->internal_redirect($to);
return DONE;
}
else
{
return DECLINED;
}
}
sub GetAvailableVariants
{
# $myvar is only passed when the uri/filename already include the
# variant $myvar. This is usually not the case. Normally
# $myvar is undef.
my ($r, $myvar) = @_;
my $filename = $r->filename;
# break the filname into path and file pieces.
my %available;
$available{path} = substr($filename, 0, rindex($filename, '/') + 1);
my $file = substr($filename, length($available{path}));
my $uri = $r->uri;
$available{var} = {};
# if this is a lingual request, the we need to rewrite it so that
# it appears to not be.
if ($myvar)
{
$available{var}{$myvar}{uri} = $uri;
$available{var}{$myvar}{file} = $file;
$available{str} = "$myvar=$file";
$file =~ s!\.$myvar\.!\.!;
$uri =~ s!\.$myvar\.!\.!;
}
$available{file} = $file;
# Set this variable to keep the subrequest from performing
# negotiation (negotiation is not needed for these lookup_uris
# and could cause infinite loops).
# cycle through all of our available variants
# and figure out which ones exist.
my $newfile;
my $newuri;
my $v;
foreach $v (split ' ', $r->dir_config('NegoAvailableVariants'))
{
# we don't need to check to see if the current request
# exists.
next
if $v eq $myvar;
$newfile = $file;
$newuri = $uri;
unless ($newfile =~ s!(.+\.)(.+)$!$1$v.$2!
&& $newuri =~ s!(.*/)($file)!$1$newfile!)
{
$available{error} = "Could not look for variants for '$uri'.";
return \%available;
}
$r->notes->{'NegoDisable'} = 1;
my $subr = $r->lookup_uri($newuri);
delete $r->notes->{'NegoDisable'};
# this is the same test that mod_dir does to see if a URL
# succeeded or not.
if ($subr->status == HTTP_OK && (-e $subr->finfo()))
{
$r->log_error("Found variant $v")
if $DEBUG;
$available{var}{$v}{file} = $newfile;
$available{var}{$v}{uri} = $newuri;
$available{str} .= ($available{str} ? ' ' : '') . "$v=$newfile";
}
}
return \%available;
}
sub GetPreferredVariants
{
my $r = shift;
# now let's figure out what the order of preference is.
# and select our return variant at the same time.
my @preferred; # list of variants in preferential order
my $v;
foreach $v (split ' ', $r->dir_config('NegoMatchOrder'))
{
if ($v eq 'Browser')
{
# browser specified language preferences
push @preferred, split(',', $r->headers_in->{'Accept-language'});
}
elsif ($v =~ /Host\((.+)\)/)
{
# if it's a Host(something.com=en) block.
# NegoHostMatch preferences. NegoHostMatch
# has the following syntax:
# NegoHostMatch www.wolfram.co.uk=en www.wolfram.co.jp=ja
www.wolfram.com=en
my $host = $r->headers_in->{'Host'};
my ($hvar) = $v =~ /Host\($host=([\w-]+)\)/;
push @preferred, $hvar
if ($hvar);
}
else
{
push @preferred, $v
if $v;
}
}
return \@preferred;
}
1;
__END__
=head1 NAME
Negotiation.pm - Module to handle lingual content negotiation reasonably
=head1 AUTHOR
Max Campos
=head1 SYNOPSIS
In your .htaccess file ...
PerlRequire /www/perl/libs/WRI/Negotiation.pm
PerlFixupHandler WRI::Negotiation
PerlSetVar NegoAvailableVariants "en ja"
PerlSetVar NegoMatchOrder "Host(www.wolfram.co.jp=ja) Browser en ja"
=head1 WHAT IT DOES
=head2 Definitions
"lingual document" - A document that has a language extension. (such
as: whatever.ja.html, foo.en.cgi, blech.pr.php3).
"negotiated request" - An incoming request for non-lingual document
in which the non-lingual document does not exist but lingual version(s) do.
(ie. a request for something.html, and something.en.html,
something.ja.html exist but something.html does not)
"lingual document request" - an incoming request for a lingual
document. (ie. an incoming request for foo.ja.html).
=head2 Configuration Directives
=over 4
=item NegoDisable 1
Disable negotiation for this directory and all directories below it.
=item NegoAvailableVariants ("en ja")
list of variants to check for when determining what files are
available/etc.
=item NegoMatchOrder ("Browser HostMatch Default")
The order in which to select variants (as well as set the
WRI_PREFERRED_VARIANTS variable). It's a space separated list with each
item being one of three things:
=over 4
=item Browser
Adds browser provided Accept-Language variants to the
preference list.
=item Host(hostname.something.com=vr)
Adds variant vr to the preference
list if the client present's a Host: header that matches
hostname.something.com.
=item ja
Anything that does not match the two above specifications is added
to the variant list as is.
=back
=back
=head2 Environment Variables
=over 4
=item WRI_AVAILABLE_VARIANTS
ja=index.ja.cgi en=index.en.cgi
This is a list of all of the available variants for the current file,
and their filenames.
Presented To: Negotiated requests, lingual document requests
=item WRI_PREFERRED_VARIANTS
kabuki en pr en ja
=over 4
=item *
This is a list of variants in preferential order as determined by the
NegoMatchOrder directive (this could be any combination of host based
settings, browser presented languages, as well as the default).
=item *
This list may include languages that we have never heard of.
=item *
This list may include repeats.
=item *
Since this is not based on what is available, we can provide this to all
requests.
=back
Presented to: *all* requests.
=item WRI_VARIANT
en
=over 4
=item *
This is the language of the current request.
=item *
Essentially, this is the first element of WRI_PREFERRED_VARIANTS that is
also listed in WRI_AVAILABLE_VARIANTS
=back
Presented To: Negotiated requests, lingual document requests
=back
=head1 REQUEST PROCESSING
=head2 Process
I<* set WRI_PREFERRED_VARIANTS>
- Does the file requested exist?
Yes? Do not negotiate. (DONE)
No? Continue...
- Is it a non-lingual request?
Yes? Continue...
No? 404.
- Do variants exists?
Yes? Continue...
No? 404.
I<* set WRI_AVAILABLE_VARIANTS>
- Does an appropriate variant for the file exist?
Yes? set WRI_VARIANT and return it
No? 404.
=head2 Example Scenarios
-> Request for index.html, where only index.html, index.ja.html,
index.en.html exist.
<- index.html is returned.
B<------------------------------>
-> Request for index.html, where only index.ja.html index.en.html
exist.
<- Most appropriate document is returned (or 404 if no appropriate
B<------------------------------>
-> Request for index.ja.html where index.html index.ja.html index.en.html
exist.
<- index.ja.html is returned.
-> Request for index.ja.html where index.html index.en.html exist.
<- 404.
=cut