Re: httpd rewrites with Lua's pattern matching
On Tue, Jun 23, 2015 at 10:25:36AM -0400, trondd wrote: On Sun, June 21, 2015 10:01 am, Reyk Floeter wrote: location match '^/page/(%d+)' { block return 302 /index.cgi?page=%1' } So I was playing with the below config, then figured out it's not coded to capture on 'server match'. I want to redirect anything I get to use https and add the FQDN without having to care which domain they are trying to get to. A simplified config: server match '^(.*)$' { listen on em0 port 80 block return 301 'https://%1.my.fqdn.com/' } Is there a way to do it? Any reason to not capture on 'server match'? It is just not done yet. As I said, we're improving the interface. But this doesn't affect the initial implementation itself. Reyk
Re: httpd rewrites with Lua's pattern matching
On Tue, Jun 23, 2015 at 04:20:58PM +0200, Reyk Floeter wrote: On Sat, Jun 20, 2015 at 03:01:18PM +0200, Reyk Floeter wrote: Comments? OK? This diff includes some fixes from semarie@. We also have regress tests that will go in separately. We'd like to continue in the tree. OK? Reyk Some comments below, but I think they could be addressed in tree. ok semarie@ [...] Index: patterns.c === RCS file: patterns.c diff -N patterns.c --- /dev/null 1 Jan 1970 00:00:00 - +++ patterns.c23 Jun 2015 14:07:16 - @@ -0,0 +1,715 @@ +/* $OpenBSD$ */ + +/* + * Copyright (c) 2015 Reyk Floeter r...@openbsd.org + * Copyright (C) 1994-2015 Lua.org, PUC-Rio. + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * Software), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be + * included in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY + * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ + +/* + * Derived from Lua 5.3.1: + * $Id: lstrlib.c,v 1.229 2015/05/20 17:39:23 roberto Exp $ + * Standard library for string operations and pattern-matching + */ + [...] +static int +match_class(int c, int cl) +{ + int res; + switch (tolower(cl)) { + case 'a': + res = isalpha(c); + break; + case 'c': + res = iscntrl(c); + break; + case 'd': + res = isdigit(c); + break; + case 'g': + res = isgraph(c); + break; + case 'l': + res = islower(c); + break; + case 'p': + res = ispunct(c); + break; + case 's': + res = isspace(c); + break; + case 'u': + res = isupper(c); + break; + case 'w': + res = isalnum(c); + break; + case 'x': + res = isxdigit(c); + break; + case 'z': + res = (c == 0); + break; /* deprecated option */ I think this deprecated option should be removed. It is deprecated in lua, but here, the code is new. The documentation don't mention it either. + default: + return (cl == c); + } + return (islower(cl) ? res : !res); +} + [...] Index: server_http.c === RCS file: /cvs/src/usr.sbin/httpd/server_http.c,v retrieving revision 1.82 diff -u -p -u -p -r1.82 server_http.c --- server_http.c 22 Jun 2015 11:46:06 - 1.82 +++ server_http.c 23 Jun 2015 14:07:16 - @@ -29,14 +29,16 @@ #include string.h #include unistd.h #include limits.h +#include fnmatch.h #include stdio.h #include time.h #include resolv.h #include event.h -#include fnmatch.h +#include ctype.h #include httpd.h #include http.h +#include patterns.h static intserver_httpmethod_cmp(const void *, const void *); static intserver_httperror_cmp(const void *, const void *); [...] @@ -882,11 +887,34 @@ server_expand_http(struct client *clt, c struct http_descriptor *desc = clt-clt_descreq; struct server_config*srv_conf = clt-clt_srv_conf; char ibuf[128], *str, *path, *query; - int ret; + const char *errstr = NULL, *p; + size_t size; + int n, ret; if (strlcpy(buf, val, len) = len) return (NULL); + /* Find previously matched substrings by index */ + for (p = val; clt-clt_srv_match.sm_nmatch + (p = strstr(p, %)) != NULL; p++) { + if (!isdigit(*(p + 1))) + continue; + + /* Copy number, leading '%' char and add trailing \0 */ + size = strspn(p + 1, 0123456789) + 2; + if (size = sizeof(ibuf)) + return (NULL); + (void)strlcpy(ibuf, p, size); + n = strtonum(ibuf + 1, 0, +
Re: httpd rewrites with Lua's pattern matching
On Sun, June 21, 2015 10:01 am, Reyk Floeter wrote: location match '^/page/(%d+)' { block return 302 /index.cgi?page=%1' } So I was playing with the below config, then figured out it's not coded to capture on 'server match'. I want to redirect anything I get to use https and add the FQDN without having to care which domain they are trying to get to. A simplified config: server match '^(.*)$' { listen on em0 port 80 block return 301 'https://%1.my.fqdn.com/' } Is there a way to do it? Any reason to not capture on 'server match'? Tim.
Re: httpd rewrites with Lua's pattern matching
On Sat, Jun 20, 2015 at 03:01:18PM +0200, Reyk Floeter wrote: Comments? OK? This diff includes some fixes from semarie@. We also have regress tests that will go in separately. We'd like to continue in the tree. OK? Reyk Index: Makefile === RCS file: /cvs/src/usr.sbin/httpd/Makefile,v retrieving revision 1.27 diff -u -p -u -p -r1.27 Makefile --- Makefile23 Feb 2015 10:39:10 - 1.27 +++ Makefile23 Jun 2015 14:07:14 - @@ -6,6 +6,9 @@ SRCS+= config.c control.c httpd.c log.c SRCS+= server.c server_http.c server_file.c server_fcgi.c MAN= httpd.8 httpd.conf.5 +SRCS+= patterns.c +MAN+= patterns.7 + LDADD= -levent -ltls -lssl -lcrypto -lutil DPADD= ${LIBEVENT} ${LIBTLS} ${LIBSSL} ${LIBCRYPTO} ${LIBUTIL} #DEBUG=-g -DDEBUG=3 -O0 Index: httpd.conf.5 === RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v retrieving revision 1.61 diff -u -p -u -p -r1.61 httpd.conf.5 --- httpd.conf.528 May 2015 19:29:40 - 1.61 +++ httpd.conf.523 Jun 2015 14:07:14 - @@ -131,14 +131,38 @@ The configured web servers. .Pp Each .Ic server -must have a -.Ar name -and include one or more lines of the following syntax: +section starts with a declaration of the server +.Ar name : +.Bl -tag -width Ds +.It Ic server Ar name Brq ... +Match the server name using shell globbing rules. +This can be an explicit name, +.Ar www.example.com , +or a name including wildcards, +.Ar *.example.com . +.It Ic server match Ar name Brq ... +Match the server name using pattern matching, +see +.Xr patterns 7 . +.El +.Pp +Followed by a block of options that is enclosed in curly brackets: .Bl -tag -width Ds .It Ic alias Ar name Specify an additional alias .Ar name for this server. +.It Ic alias match Ar name +Like the +.Ic alias +option, +but +.Ic match +the +.Ar name +using pattern matching instead of shell globbing rules, +see +.Xr patterns 7 . .It Oo Ic no Oc Ic authenticate Oo Ar realm Oc Ic with Pa htpasswd Authenticate a remote user for .Ar realm @@ -188,6 +212,12 @@ The configured IP address of the server. The configured TCP server port of the server. .It Ic $SERVER_NAME The name of the server. +.It Ic Pf % Ar n +The capture index +.Ar n +of a string that was captured by the enclosing +.Ic location match +option. .El .It Ic connection Ar option Set the specified options and limits for HTTP connections. @@ -247,6 +277,22 @@ except .Ic location and .Ic tcp . +.It Ic location match Ar path Brq ... +Like the +.Ic location +option, +but +.Ic match +the +.Ar path +using pattern matching instead of shell globbing rules, +see +.Xr patterns 7 . +The pattern may contain captures that can be used in the +.Ar uri +of an enclosed +.Ic block return +option. .It Oo Ic no Oc Ic log Op Ar option Set the specified logging options. Logging is enabled by default using the standard @@ -516,6 +562,7 @@ server www.example.com { .Ed .Sh SEE ALSO .Xr htpasswd 1 , +.Xr patterns 7 , .Xr httpd 8 , .Xr slowcgi 8 .Sh AUTHORS Index: httpd.h === RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v retrieving revision 1.83 diff -u -p -u -p -r1.83 httpd.h --- httpd.h 20 May 2015 09:28:47 - 1.83 +++ httpd.h 23 Jun 2015 14:07:15 - @@ -35,6 +35,8 @@ #include imsg.h #include tls.h +#include patterns.h + #define CONF_FILE /etc/httpd.conf #define HTTPD_SOCKET /var/run/httpd.sock #define HTTPD_USER www @@ -278,6 +280,7 @@ struct client { void*clt_srv_conf; u_int32_tclt_srv_id; struct sockaddr_storage clt_srv_ss; + struct str_match clt_srv_match; int clt_s; in_port_tclt_port; @@ -341,12 +344,15 @@ SPLAY_HEAD(client_tree, client); #define SRVFLAG_NO_AUTH0x0002 #define SRVFLAG_BLOCK 0x0004 #define SRVFLAG_NO_BLOCK 0x0008 +#define SRVFLAG_LOCATION_MATCH 0x0010 +#define SRVFLAG_SERVER_MATCH 0x0020 #define SRVFLAG_BITS \ \10\01INDEX\02NO_INDEX\03AUTO_INDEX\04NO_AUTO_INDEX \ \05ROOT\06LOCATION\07FCGI\10NO_FCGI\11LOG\12NO_LOG\13SOCKET \ \14SYSLOG\15NO_SYSLOG\16TLS\17ACCESS_LOG\20ERROR_LOG \ - \21AUTH\22NO_AUTH\23BLOCK\24NO_BLOCK + \21AUTH\22NO_AUTH\23BLOCK\24NO_BLOCK\25LOCATION_MATCH \ + \26SERVER_MATCH #define TCPFLAG_NODELAY0x01 #define TCPFLAG_NNODELAY 0x02 Index: parse.y === RCS file: /cvs/src/usr.sbin/httpd/parse.y,v retrieving revision 1.67 diff -u -p -u -p -r1.67 parse.y --- parse.y 1 Apr 2015
Re: httpd rewrites with Lua's pattern matching
On Tue, June 23, 2015 11:28 am, Reyk Floeter wrote: It is just not done yet. As I said, we're improving the interface. But this doesn't affect the initial implementation itself. Reyk Ok, thanks. I think I have a 'location match' use case I can play with, too. Tim.
Re: httpd rewrites with Lua's pattern matching
On Sat, Jun 20, 2015 at 03:01:18PM +0200, Reyk Floeter wrote: there is some great interest in getting support for rewrites What do people think of something like our tftpd(8)'s -r -r socket Issue filename rewrite requests to the specified UNIX domain socket. tftpd will write lines in the format IP OP filename, terminated by a newline, where IP is the client's IP address, and OP is one of read or write. tftpd expects replies in the format filename terminated by a newline. All rewrite requests from the daemon must be answered (even if it is with the original filename) before the TFTP request will continue. By default tftpd does not use filename rewriting. I was working on a patch to bring it to httpd but ran out of free time. Thought I'd pass the idea by you anyway. I think it's a sweet spot of a minimum incrase in complexity and maximum incrase in flexibility. Then people could plug in whatever they wanted: be it trivial string substitutions, guaranteed safe regexes with re2[1], potentially unsafe regexes with pcre, or even database lookups or whatever. [1]: https://swtch.com/~rsc/regexp/regexp3.html Reyk: Sorry for the duplicate, meant to reply to the list.
Re: httpd rewrites with Lua's pattern matching
On Tue, Jun 23, 2015 at 02:40:48PM -0400, Jean-Philippe Ouellet wrote: On Sat, Jun 20, 2015 at 03:01:18PM +0200, Reyk Floeter wrote: there is some great interest in getting support for rewrites What do people think of something like our tftpd(8)'s -r -r socket Issue filename rewrite requests to the specified UNIX domain socket. tftpd will write lines in the format IP OP filename, terminated by a newline, where IP is the client's IP address, and OP is one of read or write. tftpd expects replies in the format filename terminated by a newline. All rewrite requests from the daemon must be answered (even if it is with the original filename) before the TFTP request will continue. By default tftpd does not use filename rewriting. I was working on a patch to bring it to httpd but ran out of free time. Thought I'd pass the idea by you anyway. I think it's a sweet spot of a minimum incrase in complexity and maximum incrase in flexibility. Then people could plug in whatever they wanted: be it trivial string substitutions, guaranteed safe regexes with re2[1], potentially unsafe regexes with pcre, or even database lookups or whatever. [1]: https://swtch.com/~rsc/regexp/regexp3.html I have no oppinion about the use in tftpd, but it would sound weird for httpd. The use of patterns.c was intended to have one reasonably simple implementation, not a button for a maximum increase of flexibility. The real question is: what do people _have to_ do and can they also do it with the new patterns? Reyk: Sorry for the duplicate, meant to reply to the list. no problem, I have thousands of openbsd emails, this one won't hurt ;) Reyk
Re: httpd rewrites with Lua's pattern matching
what a nice small piece of code! I think it should be made into library, as there are other projects(think smtpd), that would benefit from it too. On 06/21/2015 05:01 PM, Reyk Floeter wrote: On Sat, Jun 20, 2015 at 03:01:18PM +0200, Reyk Floeter wrote: Hi, there is some great interest in getting support for rewrites and better matching in httpd. I refused to implement this using regex, as regex is extremely complicated code, there have been lots of bugs, they allow, if not specified carefully, dangerous recursions and ReDOS, and I would add another potential attack surface in httpd. Thanks to tedu@'s hint at BSDCan, I stumbled across Lua's pattern matching implementation. It is relatively small (less than 700loc), powerful, portable C code, MIT-licensed, and doesn't suffer from some of regex' problems (eg., it doesn't allow recursive captures). I ported it on my flight back from Ottawa, KNF'ed it, and turned it into a C API without the Lua bindings. No, this diff does not bring the Lua language to httpd! Here is a diff that adds pattern matching to httpd, allowing rewrites with redirects. Additional use cases can be added later and, if it works out, we can probably move it to an existing library. location match '^/page/(%d+)' { block return 302 /index.cgi?page=%1' } This diff will be two commits, one is pending to fix '?' in the uri. Comments? OK? Here is an updated version with some fixes from Sebastien Marie (semarie). We're continuing to work on it, but would like to do it in the tree. OK? Any concerns? Reyk diff --git httpd/Makefile httpd/Makefile index 885ad42..69fdb5e 100644 --- httpd/Makefile +++ httpd/Makefile @@ -6,6 +6,9 @@ SRCS+= config.c control.c httpd.c log.c logger.c proc.c SRCS+=server.c server_http.c server_file.c server_fcgi.c MAN= httpd.8 httpd.conf.5 +SRCS+= patterns.c +MAN+= patterns.7 + LDADD=-levent -ltls -lssl -lcrypto -lutil DPADD=${LIBEVENT} ${LIBTLS} ${LIBSSL} ${LIBCRYPTO} ${LIBUTIL} #DEBUG= -g -DDEBUG=3 -O0 diff --git httpd/httpd.conf.5 httpd/httpd.conf.5 index 87866d2..24d92ac 100644 --- httpd/httpd.conf.5 +++ httpd/httpd.conf.5 @@ -131,14 +131,38 @@ The configured web servers. .Pp Each .Ic server -must have a -.Ar name -and include one or more lines of the following syntax: +section starts with a declaration of the server +.Ar name : +.Bl -tag -width Ds +.It Ic server Ar name Brq ... +Match the server name using shell globbing rules. +This can be an explicit name, +.Ar www.example.com , +or a name including wildcards, +.Ar *.example.com . +.It Ic server match Ar name Brq ... +Match the server name using pattern matching, +see +.Xr patterns 7 . +.El +.Pp +Followed by a block of options that is enclosed in curly brackets: .Bl -tag -width Ds .It Ic alias Ar name Specify an additional alias .Ar name for this server. +.It Ic alias match Ar name +Like the +.Ic alias +option, +but +.Ic match +the +.Ar name +using pattern matching instead of shell globbing rules, +see +.Xr patterns 7 . .It Oo Ic no Oc Ic authenticate Oo Ar realm Oc Ic with Pa htpasswd Authenticate a remote user for .Ar realm @@ -188,6 +212,12 @@ The configured IP address of the server. The configured TCP server port of the server. .It Ic $SERVER_NAME The name of the server. +.It Ic Pf % Ar n +The capture index +.Ar n +of a string that was captured by the enclosing +.Ic location match +option. .El .It Ic connection Ar option Set the specified options and limits for HTTP connections. @@ -247,6 +277,22 @@ except .Ic location and .Ic tcp . +.It Ic location match Ar path Brq ... +Like the +.Ic location +option, +but +.Ic match +the +.Ar path +using pattern matching instead of shell globbing rules, +see +.Xr patterns 7 . +The pattern may contain captures that can be used in the +.Ar uri +of an enclosed +.Ic block return +option. .It Oo Ic no Oc Ic log Op Ar option Set the specified logging options. Logging is enabled by default using the standard @@ -516,6 +562,7 @@ server www.example.com { .Ed .Sh SEE ALSO .Xr htpasswd 1 , +.Xr patterns 7 , .Xr httpd 8 , .Xr slowcgi 8 .Sh AUTHORS diff --git httpd/httpd.h httpd/httpd.h index 1431eaa..ff76281 100644 --- httpd/httpd.h +++ httpd/httpd.h @@ -35,6 +35,8 @@ #include imsg.h #include tls.h +#include patterns.h + #define CONF_FILE /etc/httpd.conf #define HTTPD_SOCKET /var/run/httpd.sock #define HTTPD_USERwww @@ -278,6 +280,7 @@ struct client { void*clt_srv_conf; u_int32_tclt_srv_id; struct sockaddr_storage clt_srv_ss; + struct str_match clt_srv_match; int clt_s; in_port_tclt_port; @@ -341,12 +344,15 @@ SPLAY_HEAD(client_tree, client); #define SRVFLAG_NO_AUTH 0x0002 #define SRVFLAG_BLOCK
Re: httpd rewrites with Lua's pattern matching
On Sat, Jun 20, 2015 at 03:01:18PM +0200, Reyk Floeter wrote: Hi, there is some great interest in getting support for rewrites and better matching in httpd. I refused to implement this using regex, as regex is extremely complicated code, there have been lots of bugs, they allow, if not specified carefully, dangerous recursions and ReDOS, and I would add another potential attack surface in httpd. Thanks to tedu@'s hint at BSDCan, I stumbled across Lua's pattern matching implementation. It is relatively small (less than 700loc), powerful, portable C code, MIT-licensed, and doesn't suffer from some of regex' problems (eg., it doesn't allow recursive captures). I ported it on my flight back from Ottawa, KNF'ed it, and turned it into a C API without the Lua bindings. No, this diff does not bring the Lua language to httpd! Here is a diff that adds pattern matching to httpd, allowing rewrites with redirects. Additional use cases can be added later and, if it works out, we can probably move it to an existing library. location match '^/page/(%d+)' { block return 302 /index.cgi?page=%1' } This diff will be two commits, one is pending to fix '?' in the uri. Comments? OK? Here is an updated version with some fixes from Sebastien Marie (semarie). We're continuing to work on it, but would like to do it in the tree. OK? Any concerns? Reyk diff --git httpd/Makefile httpd/Makefile index 885ad42..69fdb5e 100644 --- httpd/Makefile +++ httpd/Makefile @@ -6,6 +6,9 @@ SRCS+= config.c control.c httpd.c log.c logger.c proc.c SRCS+= server.c server_http.c server_file.c server_fcgi.c MAN= httpd.8 httpd.conf.5 +SRCS+= patterns.c +MAN+= patterns.7 + LDADD= -levent -ltls -lssl -lcrypto -lutil DPADD= ${LIBEVENT} ${LIBTLS} ${LIBSSL} ${LIBCRYPTO} ${LIBUTIL} #DEBUG=-g -DDEBUG=3 -O0 diff --git httpd/httpd.conf.5 httpd/httpd.conf.5 index 87866d2..24d92ac 100644 --- httpd/httpd.conf.5 +++ httpd/httpd.conf.5 @@ -131,14 +131,38 @@ The configured web servers. .Pp Each .Ic server -must have a -.Ar name -and include one or more lines of the following syntax: +section starts with a declaration of the server +.Ar name : +.Bl -tag -width Ds +.It Ic server Ar name Brq ... +Match the server name using shell globbing rules. +This can be an explicit name, +.Ar www.example.com , +or a name including wildcards, +.Ar *.example.com . +.It Ic server match Ar name Brq ... +Match the server name using pattern matching, +see +.Xr patterns 7 . +.El +.Pp +Followed by a block of options that is enclosed in curly brackets: .Bl -tag -width Ds .It Ic alias Ar name Specify an additional alias .Ar name for this server. +.It Ic alias match Ar name +Like the +.Ic alias +option, +but +.Ic match +the +.Ar name +using pattern matching instead of shell globbing rules, +see +.Xr patterns 7 . .It Oo Ic no Oc Ic authenticate Oo Ar realm Oc Ic with Pa htpasswd Authenticate a remote user for .Ar realm @@ -188,6 +212,12 @@ The configured IP address of the server. The configured TCP server port of the server. .It Ic $SERVER_NAME The name of the server. +.It Ic Pf % Ar n +The capture index +.Ar n +of a string that was captured by the enclosing +.Ic location match +option. .El .It Ic connection Ar option Set the specified options and limits for HTTP connections. @@ -247,6 +277,22 @@ except .Ic location and .Ic tcp . +.It Ic location match Ar path Brq ... +Like the +.Ic location +option, +but +.Ic match +the +.Ar path +using pattern matching instead of shell globbing rules, +see +.Xr patterns 7 . +The pattern may contain captures that can be used in the +.Ar uri +of an enclosed +.Ic block return +option. .It Oo Ic no Oc Ic log Op Ar option Set the specified logging options. Logging is enabled by default using the standard @@ -516,6 +562,7 @@ server www.example.com { .Ed .Sh SEE ALSO .Xr htpasswd 1 , +.Xr patterns 7 , .Xr httpd 8 , .Xr slowcgi 8 .Sh AUTHORS diff --git httpd/httpd.h httpd/httpd.h index 1431eaa..ff76281 100644 --- httpd/httpd.h +++ httpd/httpd.h @@ -35,6 +35,8 @@ #include imsg.h #include tls.h +#include patterns.h + #define CONF_FILE /etc/httpd.conf #define HTTPD_SOCKET /var/run/httpd.sock #define HTTPD_USER www @@ -278,6 +280,7 @@ struct client { void*clt_srv_conf; u_int32_tclt_srv_id; struct sockaddr_storage clt_srv_ss; + struct str_match clt_srv_match; int clt_s; in_port_tclt_port; @@ -341,12 +344,15 @@ SPLAY_HEAD(client_tree, client); #define SRVFLAG_NO_AUTH0x0002 #define SRVFLAG_BLOCK 0x0004 #define SRVFLAG_NO_BLOCK 0x0008 +#define SRVFLAG_LOCATION_MATCH 0x0010 +#define SRVFLAG_SERVER_MATCH 0x0020 #define SRVFLAG_BITS