Re: [fossil-users] Fossil behind reverse proxy

Kyle McKay Sat, 30 Jan 2010 20:28:54 -0800

On Jan 30, 2010, at 04:00, Paul Ruizendaal wrote:
> Hi Kyle,
>
> In the default admin setup, the logo path
> needed to be fixed from '/logo' to '$baseurl/logo', but then it works
> fully.


I didn't need to do that.  Must have been fixed in later versions of  
fossil.  That would be a bug for the cgi command as well.  My new  
repositories already had that correct without needing to edit anything.

> I can confirm that it also works on Linux and Windows, not just
> Darwin. For folks not using Apache it would be good if your below  
> 'how to'
> could mention that the reverse proxy needs to strip the baseurl of  
> the uri
> it forwards to the Fossil server (i.e.: '/fossil/index' must be  
> forwarded
> as '/index').

That is standard behavior for a reverse proxy as the proxy machine the  
requests are being sent to has absolutely no knowledge of where it's  
being mapped into the other machine's web space or even that it's  
being used as a proxy in the first place (unless it starts inspecting  
X-Forwarded-... and/or Via headers).

Normally in this situation, however, you would expect that content  
coming from the proxy machine would have to be inspected and have any  
contained links rewritten to match the other machine's web space  
(mod_proxy_html can do this <http://apache.webthing.com/mod_proxy_html/ 
 >) and indeed I had it working using mod_poxy_html when I realized it  
wasn't necessary.  I prefer to avoid the extra overhead of inspecting  
the content since it's not necessary if you set SCRIPT_NAME (with the  
proviso that you mention below that you can no longer access it  
directly if you do this).

> However, this is a hack that works by accident. It works because  
> 'server'
> and 'cgi' share code paths and the 'server' code flow reads part of  
> the CGI
> environment even though it shouldn't.

Yes but unless the current fossil architecture is changed it will keep  
working.  It's also fortunate that SCRIPT_NAME is used when  
constructing the login cookie -- but again, for the "cgi" command to  
keep working it needs to.

> Can you imagine the configuration
> headache if one had an unrelated SCRIPT_NAME environment variable and
> wasn't aware of this "feature"...

I was a bit surprised that SCRIPT_NAME was used even when the  
GATEWAY_INTERFACE environment variable is not set.  Probably  
SCRIPT_NAME should only be used if GATEWAY_INTERFACE is "CGI/1.0" or  
later.  But even then that would just mean you needed to set that  
variable together with SCRIPT_NAME to use the "hack".

> Also, the hack fixes the baseUrl to one
> defined prefix. Access to a fossil server setup using this hack  
> becomes
> unusable from the web if accessed directly as well, nor can multiple
> baseurl's be mapped to a single fossil server instance. Whilst I'm  
> quite
> happy that the hack fixes my immediate problem,

Yes, me too.

> I think a better engineered
> solution is preferrable.

Undocumented behavior has a bad habit of breaking or going away -- a  
documented solution is preferred.

> How about using request headers for this? The reverse proxy could  
> add two
> custom headers to the forwared request (similar to X-Forwarded-For):
> - X-Fossil-Baseurl
> - X-Fossil-Repository
> Fossil would only look at these when in server mode.

Or in "http" mode.

> The first would
> specify the baseurl that is used to relocate all references in html/ 
> css
> output, and in redirect responses.

Using the SCRIPT_NAME hack and running in "server" mode, you do have  
to make sure that Location: redirects get corrected -- as you say, a  
reverse proxy can be expected to do this -- this is never necessary  
when running in "cgi" mode.

It might be a bit of a challenge to catch all the redirects unless you  
hack it by prepending SCRIPT_NAME to the value stuffed into  
REQUEST_URI by the source in "server" and "http" modes.

This line in cgi.c (in the cgi_handle_http_request function):

   cgi_setenv("REQUEST_URI", zToken);

would need to change to set REQUEST_URI to the contents of SCRIPT_NAME  
concatenated to zToken instead of what it does now.  That would make  
the SCRIPT_NAME hack produce correct redirects and I believe make a  
SCRIPT_NAME hack running server be directly accessible.  Something  
like this but without the memory leak or double getenv call:

   cgi_setenv("REQUEST_URI", mprintf("%s%s", (getenv("SCRIPT_NAME")? 
getenv("SCRIPT_NAME"):""), zToken));

> This would work very well with my own (soon to be published, GPL'ed)
> reverse proxy. It would also work very well with Lighttpd, using its
> mod_magnet module. Would it be workable with Apache too? (I'm not  
> familiar
> with Apache configuration).

The updated Apache 2 configuration to reverse proxy a fossil server  
running like this:

export SCRIPT_NAME=/fos
fossil server -P 8080 /path/to/some/fossil/repository

is this:

RewriteEngine On
RewriteRule ^/fos$ /fos/ [PT]
ProxyPreserveHost On
ProxyPass /fos/ http://machine_running_fossil_server:8080/
<Location /fos/>
        ProxyPassReverse /
        RequestHeader set X-Fossil-Baseurl /fos
</Location>

And that does pass on the new X-Fossil-Baseurl header to the server  
(which currently just ignores it).  (The added Rewrite... lines  
seamlessly handle accesses to the base fossil URL that do not have a  
trailing '/' and the added ProxyPassReverse takes care of rewriting  
the Location: paths -- these lines were missing from the earlier  
configuration.)  More RequestHeader lines could be added to pass on  
whatever additional headers are desired.

I like your suggestion of using the extra headers -- I will just have  
my fossilserver.pl script grab them so it can be configured entirely  
with the Apache configuration (except, obviously, for the port it  
listens on).  Apache can also set environment variables when running a  
cgi script and I intend to allow the fossilserver.pl script to run as  
either standalone or as a cgi, so I plan to have it accept either the  
header when running directly or the HTTP_X_FOSSIL_BASEURL environment  
variable (in addition to being able to set arbitrary environment  
variables, Apache automatically passes on all http headers to cgi  
scripts by prefixing them with "HTTP_", converting them to UPPERCASE  
and changing any "-" to "_" as required by the CGI standard 
<http://hoohoo.ncsa.illinois.edu/cgi/env.html 
 > and <http://tools.ietf.org/html/rfc3875>).

Perhaps this is the way the "cgi" command should pick up  
"repository:" ?  That would eliminate the middleman cgi script one  
needs right now and move all the configuration information into the  
web server configuration files.  (Fossil already automatically goes  
into cgi mode if run with no command line arguments and the  
GATEWAY_INTERFACE environment variable is set to anything and  
interestingly in this case attempts to read its configuration script  
[looking for the "repository:" line] from standard input -- should be  
a simple matter to have it grab from the HTTP_X_FOSSIL_... environment  
variables instead if no arguments are passed.)

The only question would be whether or not most web servers can set the  
extra headers in their configuration files when running a local cgi  
script the way Apache can.

Kyle

> On Thu, 28 Jan 2010 11:53:36 -0800, Kyle McKay <mack...@gmail.com>  
> wrote:
>> Paul,
>>
>> I'm running a fossil server behind an Apache reverse proxy quite
>> happily.  I've been meaning to add something to the wiki cookbook
>> about this but just haven't got around to it yet.
>>
>> I'm doing this because:
>>
>> 1. I want a fossil UI to be always on and available via my web server
>> 2. I want the fossil server to run as a different user account than
>> the web server processes
>> 3. I don't want to use any suid programs (i.e. suExec)
>>
>> My apache web server is setup so that:
>>
>>   http://my_server_name/fossil
>>
>> Is reverse proxied to the fossil server process that is running as a
>> daemon on a separate port
>>
>>   http://my_server_name/anything-other-than-fossil-here
>>
>> Serves up whatever else would normally be served on my server.
>>
>> To make this work (I'm running on Darwin which is very Unix like) you
>> need to do these two things (the examples assume you have a bash  
>> shell):
>>
>> 1. Start your fossil server daemon running with a shell script like  
>> this
>>
>>    #!/bin/sh
>>    export SCRIPT_NAME=/fossil
>>    fossil server -P 8000 full_path_to_fossil_respository_here &
>>
>> If you want to start the fossil server in its own process group, add
>> this line:
>>
>>    set -m
>>
>> at the beginning of the script and add this line:
>>
>>    disown
>>
>> at the end and you probably want to redirect fossil input, output and
>> error to /dev/null as well so the final script to do all of this  
>> would
>> look like (adding nohup also to make it immune to SIGHUP):
>>
>>    #!/bin/bash
>>    set -m
>>    export SCRIPT_NAME=/fossil
>>    nohup fossil server -P 8000 full_path_to_fossil_respository_here \
>>        </dev/null >/dev/null 2>&1 &
>>    disown # this is a bashism
>>
>> 2. Add this configuration section to your Apache configuration
>>
>>    ProxyPass /fossil http://machine_your_fossil_server_is_running_on:
>> 8000
>>    ProxyPreserveHost On
>>    # ProxyPreserveHost is required since fossil inspects the Host  
>> value
>>    # and without it fossil-generated links will point directly to
>> fossil
>>    # instead of the Apache server
>>
>> 3. Access your fossil server like this:
>>
>>    http://machine_apache_is_running_on/fossil
>>
>> 4. Optionally add a firewall rule to limit connections to the fossil
>> server to only those coming from the Apache server machine (be nice  
>> if
>> fossil had a loopback-only setting similar to postfix's to bind its
>> socket listener to only localhost IPv4/IPv6 interfaces).
>>
>> If you want your fossil URL to look like http://some_machine/foo/bar/
>> scm you need would change the above example lines for starting your
>> fossil server and setting your Apache configuration as follows:
>>
>>    SCRIPT_NAME=/foo/bar/scm
>>    ProxyPass /foo/bar/scm http://
>> machine_your_fossil_server_is_running_on:8000
>>
>> Similarly you can change the port the fossil server runs on just as
>> easily.
>>
>> It turns out that since fossil already handles running from an
>> arbitrary web location as a cgi script, it quite happily will still
>> use that arbitrary location when running as a server if you provide  
>> it
>> via SCRIPT_NAME.
>>
>> I wish there was functionality something like this though:
>>
>>    fossil server -P 8000 --ext .fsl
>> path_to_directory_containing_.fsl_repositories
>>
>> Where a single fossil server could serve up multiple fossil
>> repositories.  You would just point it to the parent directory and
>> tell it what repository extension to look for and then it would  
>> insert
>> an additional element into the URL using the base name of the fossil
>> repository minus the extension.  So if you had these repositories on
>> your system:
>>
>>   /some/directory/repository1.fsl
>>   /some/directory/repository2.fsl
>>
>> And started the fossil server like this:
>>
>>    fossil server -P 8080 --ext .fsl /some/directory
>>
>> Then you could access repository1.fsl like this:
>>
>>    http://localhost:8080/repository1
>>
>> and repository2.fsl like this:
>>
>>    http://localhost:8080/repository2
>>
>> and as a bonus you could get a list of available repositories with  
>> this:
>>
>>    http://localhost:8080/
>>
>> (And, of course, still use the SCRIPT_NAME trick to change the URL
>> location if you like.)
>>
>> I believe a relatively simple Perl or Python server script could use
>> the fossil http command to implement the multiple repository server
>> relatively easily since the SCRIPT_NAME technique also works with the
>> fossil http command.  Hmmm, I might just have to write that script
>> later today.
>>
>> Kyle
>>
>> On Jan 28, 2010, at 04:00, Paul Ruizendaal wrote:
>>> It may be subtler and easier than I first thought:
>>>
>>> Fossil already uses the host information from the Host: header, not
>>> from
>>> its own IP. When in CGI mode, it already relocates all its absolute
>>> references to include the prefix of the cgi script location.
>>>
>>> When running as server Fossil does not do the above relocation but
>>> keeps
>>> everyting based at root ('/'), regardless of the path in the request
>>> uri.
>>> Is there a reason that makes fossil CGI style relocation a bad idea
>>> for a
>>> fossil running in server mode?
>>>
>>> Paul
>>>
>>> ======
>>>
>>> I just tried to put Fossil (running as server) behind a reverse  
>>> proxy
>>> (home grown, but similar to "Pound").
>>>
>>> That doesn't work very well, because Fossil prefixes all paths in  
>>> its
>>> output with a full baseURL (as seen by Fossil). The client can't use
>>> that
>>> as the reverse proxy maps an entirely different prefix to the Fossil
>>> server
>>> instance. I think the html/css output by Fossil should use relative
>>> paths,
>>> not absolute paths.
>>>
>>> Next to the above, also the 301 Redirect repsonses have the wrong
>>> url, but
>>> that is as per the http RFC: it is a reasonable job for a reverse
>>> proxy to
>>> rewrite the Location: header of a 301 response.
>>>
>>> Before I attempt this rather massive patch: Richard, any remarks?
>>>
>>> Paul
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Fossil behind reverse proxy

Reply via email to