Andy Wingo <[email protected]> writes:
> Heya Neil,
Hi Andy,
I've properly read all your responses on this now, and basically agree
with them all. I've just added a few further comments on specific
points below. I've also looked at the updated docs, and think they're
great.
I noticed a few minor glitches in the doc text, and a patch for those is
attached. I've also attached patches for updating your (tekuti
mod-lisp) to the latest (web ...) API.
Thanks for working on this area; it's great to have this function in the
Guile core.
> So this default port stuff is a poor-man's scheme-specific
> normalization, to not display a port component in a serialization, if
> the port is the default for the scheme.
Thanks, I see that now.
>>> -- Function: uri-decode str [#:charset]
>>> Percent-decode the given STR, according to CHARSET.
>>
>> So the return value is a bytevector if CHARSET is #f, and a string if
>> not?
>
> Yes.
I'm still not completely sure here. What if STR contains normal
characters as well as possible %XX sequences. If I call uri-decode with
#:encoding #f, how is each normal character mapped into the resulting
bytevector?
>>> `parser'
>>> A procedure which takes a string and returns a parsed value.
>>>
>>> `validator'
>>> A predicate, returning `#t' iff the value is valid for this
>>> header.
>>
>> Maybe say something here about validator function often being very
>> similar to parsing function?
>
> They are not quite the same. A parser takes a string and produces a
> Scheme value, and a validator takes a Scheme value and returns #t iff
> the value is valid for that header. I will add an example.
Thanks, that makes better sense. I was previously thinking that both
the validator and the parser acted on the raw header value.
I think there's still a glitch in the doc text, and have proposed an
update in the attached patch.
>> A possibly important point: what is the scope of the space in which
>> these header declarations are made? My reason for asking is that this
>> infrastructure looks applicable for other HTTP-like protocols too, such
>> as SIP. But the detailed rules for a given header in SIP may be
>> different from a header with the same name in HTTP, and hence different
>> header-decl objects would be needed. Therefore, even though we claim no
>> other protocol support right now, perhaps we should anticipate that by
>> enhancing declare-header! so as to distinguish between HTTP-space and
>> other-protocol-spaces.
>
> It's a good question. HTTP is deliberately MIME-like, but specifies a
> number of important differences (see appendix 19.4 of RFC 2616).
>
> For now, the scope is limited to HTTP headers.
OK.
>> [After reading all through, I remain confused about exactly how general
>> this server infrastructure is intended to be]
>
> The ultimate intention is to allow the "web handler" stuff I mentioned
> at the end of the section, and to allow the web app author to not care
> very much what server is being used. To do this, we have to allow all
> kinds of "server" implementations -- CGI, direct HTTP to a socket,
> zeromq messages, etc. Regardless of how the request comes and the
> response goes, we need to be able to recognize and parse HTTP headers
> into their various appropriate data types -- and (web http) is really
> the middle, here. The (web server) stuff is a higher-level abstraction
> -- not a necessary abstraction, but helpful, if you can use it.
Thanks. After playing with the mod-lisp code, I think I've finally
understood this. The `web' in `(web ...)' means requests and response
with the HTTP-defined structure - even if they might be delivered to
application via something like modlisp or CGI. Whereas the `http' in
`(web server http)' means delivery directly from/to a socket in HTTP
wire format.
Which is fine. I daresay there might be a useful future extension to
something like SIP, but there's absolutely no need to try to engineer
that in now.
>>> -- Function: lookup-header-decl name
>>> Return the HEADER-DECL object registered for the given NAME.
>>>
>>> NAME may be a symbol or a string. Strings are mapped to headers in
>>> a case-insensitive fashion.
>>>
>>> -- Function: valid-header? sym val
>>> Returns a true value iff VAL is a valid Scheme value for the
>>> header with name SYM.
>>
>> Note slight inconsistency in the two above deffns: "Return" vs
>> "Returns".
>
> Which is the right one? "Return"? Will change.
I very much doubt that Guile is globally consistent on this; but it was
quite noticeable here.
>>> Now that we have a generic interface for reading and writing
>>> headers, we do just that.
>>>
>>> -- Function: read-header port
>>> Reads one HTTP header from PORT. Returns two values: the header
>>> name and the parsed Scheme value.
>>
>> As multiple values? Is that more helpful than as a cons?
>
> Yes, as multiple values. The advantage is that returning multiple values
> from a Scheme procedure does not cause any allocation.
Ah, OK.
>>> The `(web http)' module defines parsers and unparsers for all headers
>>> defined in the HTTP/1.1 standard. This section describes the parsed
>>> format of the various headers.
>>>
>>> We cannot describe the function of all of these headers, however, in
>>> sufficient detail.
>>
>> I don't get the point here.
>
> Do you mean that the reason is not apparent at this point in the
> document? I don't think the intro is worded very well, and indeed it
> appears to be a bit of buildup without knowing where you go... Maybe an
> example in the beginning would be apropos?
I meant that I didn't understand why "cannot" - rather than, say,
"don't" or "don't want to" - and the meaning of "sufficient detail" -
i.e. sufficient for what?
I think that the text now, "For full details on the meanings of all of
these headers, see the HTTP 1.1 standard, RFC 2616.", is better, and
covers these points.
> Or should we give brief descriptions of the meanings of all of these
> headers as well? That might be a good idea too.
No, I don't think that's needed.
>>> `transfer-encoding'
>>> A param list of transfer codings. `chunked' is the only known key.
>>
>> OK, why a param list rather than key-value? How are elements in the
>> second key-value list, say, different from elements in the first
>> key-value list?
>
> Well, some of these headers are quite unfortunate in their
> construction. In this case:
>
> Transfer-Encoding = "Transfer-Encoding" ":" 1#transfer-coding
>
> So really, this is a list. But:
>
> transfer-coding = "chunked" | transfer-extension
> transfer-extension = token *( ";" parameter )
> parameter = attribute "=" value
> attribute = token
> value = token | quoted-string
>
> Given that a transfer-extension is really a toeken with a number of
> parameters, the thing gets complicated. You could have:
>
> Transfer-Encoding: chunked,abcd,newthing;foo="bar, baz; qux";xyzzy
>
> which is hard to parse if you do it ad-hoc. (web http) parses it as:
>
> (transfer-encoding . ((chunked) ("abcd") ("newthing" ("foo . "bar, baz;
> quz") "xyzzy")))
>
> Still complicated, but more uniform at least. Saying that `chunked' is
> the only known key means that it's the only one that's translated to a
> symbol; i.e. `abcd' is parsed to a string. (This is to prevent attacks
> to intern a bunch of symbols; though symbols can be gc'd in guile.)
>
> Does that help? I'll see about replacing usages of "param list" as "list
> of key-value lists", as it's probably clearer, and we can save ourselves
> a definition.
Hmm. I still don't feel I completely understand this; but on the other
hand it's too fiddly to me to want to go into more now. I think I'll
wait until I actually have to process something with these structures.
>>> `www-authenticate'
>>> A string.
>>
>> Obviously there's lots of substructure there (in WWW-Authenticate) that
>> we just don't support yet. Is there a clear compatibility story for
>> if/when Guile is enhanced to parse that out?
>>
>> I guess yes; calling code will just need something like
>>
>> (if (string? val)
>> ;; An older Guile that doesn't parse authentication fully.
>> (do-application-own-parsing)
>> ;; A newer Guile that does parse authentication.
>> (use-the-parsed-authentication-object))
>
> That's a very good question. The problem is that if we change the parsed
> representation, then old code breaks. That's why I put in the effort to
> give (hopefully) good representations for most headers, to avoid that
> situation -- though you appear to have caught one laziness on my part
> here, and in Authorizaton, Proxy-Authenticate, and Proxy-Authorization.
I don't think I could ever think that you are lazy!
> So maybe the right thing to do here is just to bite the bullet, parse as
> the RFC says we should, and avoid this situation.
As long as it's a bounded problem, fine. (And I see that the modules do
now crack out authorizaton and authenticate headers.)
>> Hmm, I think the provision of this data type needs a bit more
>> motivation. It doesn't appear to offer much additional value, compared
>> with reading or writing the components of a request individually, and on
>> the other hand it appears to bake in assumptions about charsets and
>> content length that might not always be true.
>
> I probably didn't explain it very well then. A request record holds the
> data from a request -- the method, uri, headers, etc. Additionally it
> can be read or written. It does not actually bake in assumptions about
> character sets or the like. It's simply that that HTTP protocol is
> flawed in this respect, that it mixes textual and binary data. We want
> to be able to read and parse requests, responses, and their headers
> using string and char routines, and that's fine as the character set for
> HTTP messages is restricted to a subset of the lower ASCII set. But then
> the body of an HTTP message is fundamentally binary -- the
> content-length is specified in bytes, not characters.
Thanks. I think I see the usefulness of request and response objects
now, given the "overall picture" above.
> So the right way to read off a body is as a bytevector of the specified
> length (potentially with chunked transfer encoding of course, though we
> don't do that yet). Then if you want text, you decode using the
> character set specified in the request. If you are particularly lucky
> and it is a textual type and the charset is not specified, you can read
> it as a latin-1 string directly, otherwise you convert. Or you can deal
> with the binary data as a string.
>
> Setting the charset on the port is a bit of a hack, but it is the right
> thing to do if you are reading HTTP. And it doesn't matter what the
> charset is when you read the body as it's specified in bytes anyway and
> should be read in bytes (and then, possibly, decoded).
>
> Some more organized discussion should go in the manual... but what do
> you think?
The discussion in `An Important Note on Character Sets' looks good to
me.
>>> -- Function: adapt-response-version response version
>>> Adapt the given response to a different HTTP version. Returns a
>>> new HTTP response.
>>
>> Interesting, and adds more value to the idea of the response object.
>> Why not for the request object too - are you assuming that Guile will
>> usually be acting as the HTTP server? (Which I'm sure is correct, but
>> "usually" is not "always".)
>
> The thing is that the request initiates the transaction -- so it's the
> requestor that makes the version decision.
Oh yes, of course.
>>> 2. The `read' hook is called, to read a request from a new client.
>>> The `read' hook takes one argument, the server socket. It should
>>
>> It feels surprising for the infrastructure to pass the server socket to
>> the read hook. I'd expect the infrastructure to do the `accept' itself
>> and pass the client socket to the read hook.
>
> It's the opaque "server socket object". Doing it this way has a two
> advantages:
>
> 1) Works with other socket architectures (zeromq, particularly).
I'm not familiar with that, but will take a look.
> 2) Allows the server to make its own implementation of keepalive
> (or not).
>
> Particularly the latter is interesting -- the http implementation makes
> a big pollset (from (ice-9 poll), not yet documented), and polls on the
> server socket and all the keepalive sockets.
As does (tekuti mod-lisp) - so the duplication is a slight shame. But I
agree that there's no reason why the work to unduplicate that should be
in (web server).
Regards,
Neil
>From fe431700a6c3ddf3fb26fce9607b9c7866968949 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 19:34:01 +0000
Subject: [PATCH] Fix typos in (web ...) doc
* doc/ref/web.texi (Types and the Web): "help" -> "helpful".
(HTTP): Add closing paren. Remove code that looks like a leftover.
---
doc/ref/web.texi | 12 +++---------
1 files changed, 3 insertions(+), 9 deletions(-)
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 3c7e0cd..76aa510 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -59,8 +59,8 @@ valid dates. Error handling for a number of basic cases, like invalid
dates, occurs on the boundary in which we produce a SRFI 19 date record
from other types, like strings.
-With regards to the web, data types are help in the two broad phases of
-HTTP messages: parsing and generation.
+With regards to the web, data types are helpful in the two broad phases
+of HTTP messages: parsing and generation.
Consider a server, which has to parse a request, and produce a response.
Guile will parse the request into an HTTP request object
@@ -339,7 +339,7 @@ For example:
(string->header "FOO")
@result{} foo
-(header->string 'foo
+(header->string 'foo)
@result{} "Foo"
@end example
@@ -387,12 +387,6 @@ leaving it as a string. You could register this header with Guile's
HTTP stack like this:
@example
-(define (parse-ip str)
- (inet-aton str)
-(define (validate-ip ip)
-(define (write-ip ip port)
- (display (inet-ntoa ip) port))
-
(declare-header! "X-Client-Address"
(lambda (str)
(inet-aton str))
--
1.7.1
>From 072f3606be11e1e00542261017be531d8f4ade9b Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:31:30 +0000
Subject: [PATCH 1/6] Export server impl so that applications can use it
---
tekuti/mod-lisp.scm | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 59002b7..c302f58 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -33,7 +33,8 @@
#:use-module (web http)
#:use-module (web request)
#:use-module (web response)
- #:use-module (web server))
+ #:use-module (web server)
+ #:export (mod-lisp))
;;; FIXME: ignore SIGPIPE, otherwise apache dying will kill us
--
1.7.1
>From 00b99fec62f8999f4ae151f8d71bc0f66de63a00 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:33:44 +0000
Subject: [PATCH 2/6] Update body-related calls to new API
---
tekuti/mod-lisp.scm | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index c302f58..f644643 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -249,7 +249,7 @@
(let ((req (read-request/mod-lisp port)))
(values port
req
- (read-request-body/latin-1 req))))
+ (read-request-body req))))
(lambda (k . args)
(false-if-exception (close-port port)))))))))))))
@@ -259,9 +259,9 @@
(cond
((not body)) ; pass
((string? body)
- (write-response-body/latin-1 response body))
+ (write-response-body response (string->utf8 body)))
((bytevector? body)
- (write-response-body/bytevector response body))
+ (write-response-body response body))
(else
(error "Expected a string or bytevector for body" body)))
(close-port (response-port response))))
--
1.7.1
>From 32b9f7252de2d73681a6df47fe19fcb361e9d3a1 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:34:10 +0000
Subject: [PATCH 3/6] Avoid using lookup-header-decl, which isn't exported
---
tekuti/mod-lisp.scm | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index f644643..3a8b254 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -136,12 +136,11 @@
(begin
(display name port) (newline port)
(display val port) (newline port))
- (let ((decl (lookup-header-decl name)))
- (if (not decl)
- (error "Unknown header" name)
- (begin
- (display (header-decl-name decl) port) (newline port)
- ((header-decl-writer decl) val port) (newline port))))))
+ (if (not (known-header? name))
+ (error "Unknown header" name)
+ (begin
+ (display (header->string name) port) (newline port)
+ ((header-writer name) val port) (newline port)))))
(define (write-response-line/mod-lisp code phrase port)
(write-header/mod-lisp "Status"
--
1.7.1
>From beb163df99803dcebe8b7fcf0fb4481a1a0f7f8b Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:34:40 +0000
Subject: [PATCH 4/6] Update to new parse-header signature, which only returns one value
---
tekuti/mod-lisp.scm | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 3a8b254..f3e36a4 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -112,9 +112,10 @@
(acons (car sym-and-parser)
((cdr sym-and-parser) v)
meta))
- (call-with-values (lambda () (parse-header k v))
- (lambda (k v)
- (lp (acons k v headers) meta)))))))))
+ (lp (acons (string->header k)
+ (parse-header (string->header k) v)
+ headers)
+ meta)))))))
(define (read-request/mod-lisp port)
;; See the note in (web request) regarding chars, bytes, and strings
--
1.7.1
>From f94c51e3e73e96b9f729af38a7ee98b193eaf9c5 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 17:14:26 +0000
Subject: [PATCH 5/6] Update to new build-request signature
---
tekuti/mod-lisp.scm | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index f3e36a4..00a2921 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -123,9 +123,8 @@
(set-port-encoding! port "ISO-8859-1")
(call-with-values (lambda () (read-headers/mod-lisp port))
(lambda (headers meta)
- (build-request
+ (build-request (assq-ref meta 'url)
#:method (assq-ref meta 'method)
- #:uri (assq-ref meta 'url)
#:version (assq-ref meta 'server-protocol)
#:headers headers
#:meta meta
--
1.7.1
>From 7408a37c93caf6880302814058d2d445cd121891 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 17:15:15 +0000
Subject: [PATCH 6/6] Don't validate headers, as we get an apparently invalid Accept header
from apache2/mod-lisp
---
tekuti/mod-lisp.scm | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 00a2921..18a5649 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -128,7 +128,8 @@
#:version (assq-ref meta 'server-protocol)
#:headers headers
#:meta meta
- #:port port))))
+ #:port port
+ #:validate-headers? #f))))
(define (write-header/mod-lisp name val port)
(if (string? name)
--
1.7.1