Re: documentation for (web ...)

Neil Jerram Fri, 21 Jan 2011 15:06:47 -0800

Andy Wingo <[email protected]> writes:

> Heya Neil,


Hi Andy,

I've properly read all your responses on this now, and basically agree
with them all.  I've just added a few further comments on specific
points below.  I've also looked at the updated docs, and think they're
great.

I noticed a few minor glitches in the doc text, and a patch for those is
attached.  I've also attached patches for updating your (tekuti
mod-lisp) to the latest (web ...) API.

Thanks for working on this area; it's great to have this function in the
Guile core.

> So this default port stuff is a poor-man's scheme-specific
> normalization, to not display a port component in a serialization, if
> the port is the default for the scheme.

Thanks, I see that now.

>>>  -- Function: uri-decode str [#:charset]
>>>      Percent-decode the given STR, according to CHARSET.
>>
>> So the return value is a bytevector if CHARSET is #f, and a string if
>> not?
>
> Yes.

I'm still not completely sure here.  What if STR contains normal
characters as well as possible %XX sequences.  If I call uri-decode with
#:encoding #f, how is each normal character mapped into the resulting
bytevector?

>>>     `parser'
>>>           A procedure which takes a string and returns a parsed value.
>>> 
>>>     `validator'
>>>           A predicate, returning `#t' iff the value is valid for this
>>>           header.
>>
>> Maybe say something here about validator function often being very
>> similar to parsing function?
>
> They are not quite the same. A parser takes a string and produces a
> Scheme value, and a validator takes a Scheme value and returns #t iff
> the value is valid for that header. I will add an example.

Thanks, that makes better sense.  I was previously thinking that both
the validator and the parser acted on the raw header value.

I think there's still a glitch in the doc text, and have proposed an
update in the attached patch.

>> A possibly important point: what is the scope of the space in which
>> these header declarations are made?  My reason for asking is that this
>> infrastructure looks applicable for other HTTP-like protocols too, such
>> as SIP.  But the detailed rules for a given header in SIP may be
>> different from a header with the same name in HTTP, and hence different
>> header-decl objects would be needed.  Therefore, even though we claim no
>> other protocol support right now, perhaps we should anticipate that by
>> enhancing declare-header! so as to distinguish between HTTP-space and
>> other-protocol-spaces.
>
> It's a good question. HTTP is deliberately MIME-like, but specifies a
> number of important differences (see appendix 19.4 of RFC 2616).
>
> For now, the scope is limited to HTTP headers.

OK.

>> [After reading all through, I remain confused about exactly how general
>> this server infrastructure is intended to be]
>
> The ultimate intention is to allow the "web handler" stuff I mentioned
> at the end of the section, and to allow the web app author to not care
> very much what server is being used. To do this, we have to allow all
> kinds of "server" implementations -- CGI, direct HTTP to a socket,
> zeromq messages, etc. Regardless of how the request comes and the
> response goes, we need to be able to recognize and parse HTTP headers
> into their various appropriate data types -- and (web http) is really
> the middle, here. The (web server) stuff is a higher-level abstraction
> -- not a necessary abstraction, but helpful, if you can use it.

Thanks.  After playing with the mod-lisp code, I think I've finally
understood this.  The `web' in `(web ...)' means requests and response
with the HTTP-defined structure - even if they might be delivered to
application via something like modlisp or CGI.  Whereas the `http' in
`(web server http)' means delivery directly from/to a socket in HTTP
wire format.

Which is fine.  I daresay there might be a useful future extension to
something like SIP, but there's absolutely no need to try to engineer
that in now.

>>>  -- Function: lookup-header-decl name
>>>      Return the HEADER-DECL object registered for the given NAME.
>>> 
>>>      NAME may be a symbol or a string. Strings are mapped to headers in
>>>      a case-insensitive fashion.
>>> 
>>>  -- Function: valid-header? sym val
>>>      Returns a true value iff VAL is a valid Scheme value for the
>>>      header with name SYM.
>>
>> Note slight inconsistency in the two above deffns: "Return" vs
>> "Returns".
>
> Which is the right one? "Return"? Will change.

I very much doubt that Guile is globally consistent on this; but it was
quite noticeable here.

>>>    Now that we have a generic interface for reading and writing
>>> headers, we do just that.
>>> 
>>>  -- Function: read-header port
>>>      Reads one HTTP header from PORT. Returns two values: the header
>>>      name and the parsed Scheme value.
>>
>> As multiple values?  Is that more helpful than as a cons?
>
> Yes, as multiple values. The advantage is that returning multiple values
> from a Scheme procedure does not cause any allocation.

Ah, OK.

>>> The `(web http)' module defines parsers and unparsers for all headers
>>> defined in the HTTP/1.1 standard.  This section describes the parsed
>>> format of the various headers.
>>> 
>>>    We cannot describe the function of all of these headers, however, in
>>> sufficient detail.
>>
>> I don't get the point here.
>
> Do you mean that the reason is not apparent at this point in the
> document? I don't think the intro is worded very well, and indeed it
> appears to be a bit of buildup without knowing where you go... Maybe an
> example in the beginning would be apropos?

I meant that I didn't understand why "cannot" - rather than, say,
"don't" or "don't want to" - and the meaning of "sufficient detail" -
i.e. sufficient for what?

I think that the text now, "For full details on the meanings of all of
these headers, see the HTTP 1.1 standard, RFC 2616.", is better, and
covers these points.

> Or should we give brief descriptions of the meanings of all of these
> headers as well? That might be a good idea too.

No, I don't think that's needed.

>>> `transfer-encoding'
>>>      A param list of transfer codings.  `chunked' is the only known key.
>>
>> OK, why a param list rather than key-value?  How are elements in the
>> second key-value list, say, different from elements in the first
>> key-value list?
>
> Well, some of these headers are quite unfortunate in their
> construction.  In this case:
>
>      Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding
>
> So really, this is a list. But:
>
>        transfer-coding         = "chunked" | transfer-extension
>        transfer-extension      = token *( ";" parameter )
>        parameter               = attribute "=" value
>        attribute               = token
>        value                   = token | quoted-string
>
> Given that a transfer-extension is really a toeken with a number of
> parameters, the thing gets complicated. You could have:
>
>   Transfer-Encoding: chunked,abcd,newthing;foo="bar, baz; qux";xyzzy
>
> which is hard to parse if you do it ad-hoc.  (web http) parses it as:
>
>    (transfer-encoding . ((chunked) ("abcd") ("newthing" ("foo . "bar, baz; 
> quz") "xyzzy")))
>
> Still complicated, but more uniform at least. Saying that `chunked' is
> the only known key means that it's the only one that's translated to a
> symbol; i.e. `abcd' is parsed to a string.  (This is to prevent attacks
> to intern a bunch of symbols; though symbols can be gc'd in guile.)
>
> Does that help? I'll see about replacing usages of "param list" as "list
> of key-value lists", as it's probably clearer, and we can save ourselves
> a definition.

Hmm.  I still don't feel I completely understand this; but on the other
hand it's too fiddly to me to want to go into more now.  I think I'll
wait until I actually have to process something with these structures.

>>> `www-authenticate'
>>>      A string.
>>
>> Obviously there's lots of substructure there (in WWW-Authenticate) that
>> we just don't support yet.  Is there a clear compatibility story for
>> if/when Guile is enhanced to parse that out?
>>
>> I guess yes; calling code will just need something like
>>
>>   (if (string? val)
>>       ;; An older Guile that doesn't parse authentication fully.
>>       (do-application-own-parsing)
>>       ;; A newer Guile that does parse authentication.
>>       (use-the-parsed-authentication-object))
>
> That's a very good question. The problem is that if we change the parsed
> representation, then old code breaks. That's why I put in the effort to
> give (hopefully) good representations for most headers, to avoid that
> situation -- though you appear to have caught one laziness on my part
> here, and in Authorizaton, Proxy-Authenticate, and Proxy-Authorization.

I don't think I could ever think that you are lazy!

> So maybe the right thing to do here is just to bite the bullet, parse as
> the RFC says we should, and avoid this situation.

As long as it's a bounded problem, fine.  (And I see that the modules do
now crack out authorizaton and authenticate headers.)

>> Hmm, I think the provision of this data type needs a bit more
>> motivation.  It doesn't appear to offer much additional value, compared
>> with reading or writing the components of a request individually, and on
>> the other hand it appears to bake in assumptions about charsets and
>> content length that might not always be true.
>
> I probably didn't explain it very well then. A request record holds the
> data from a request -- the method, uri, headers, etc. Additionally it
> can be read or written. It does not actually bake in assumptions about
> character sets or the like. It's simply that that HTTP protocol is
> flawed in this respect, that it mixes textual and binary data. We want
> to be able to read and parse requests, responses, and their headers
> using string and char routines, and that's fine as the character set for
> HTTP messages is restricted to a subset of the lower ASCII set. But then
> the body of an HTTP message is fundamentally binary -- the
> content-length is specified in bytes, not characters.

Thanks.  I think I see the usefulness of request and response objects
now, given the "overall picture" above.

> So the right way to read off a body is as a bytevector of the specified
> length (potentially with chunked transfer encoding of course, though we
> don't do that yet). Then if you want text, you decode using the
> character set specified in the request. If you are particularly lucky
> and it is a textual type and the charset is not specified, you can read
> it as a latin-1 string directly, otherwise you convert. Or you can deal
> with the binary data as a string.
>
> Setting the charset on the port is a bit of a hack, but it is the right
> thing to do if you are reading HTTP. And it doesn't matter what the
> charset is when you read the body as it's specified in bytes anyway and
> should be read in bytes (and then, possibly, decoded).
>
> Some more organized discussion should go in the manual... but what do
> you think?

The discussion in `An Important Note on Character Sets' looks good to
me.

>>>  -- Function: adapt-response-version response version
>>>      Adapt the given response to a different HTTP version. Returns a
>>>      new HTTP response.
>>
>> Interesting, and adds more value to the idea of the response object.
>> Why not for the request object too - are you assuming that Guile will
>> usually be acting as the HTTP server?  (Which I'm sure is correct, but
>> "usually" is not "always".)
>
> The thing is that the request initiates the transaction -- so it's the
> requestor that makes the version decision.

Oh yes, of course.

>>>   2. The `read' hook is called, to read a request from a new client.
>>>      The `read' hook takes one argument, the server socket.  It should
>>
>> It feels surprising for the infrastructure to pass the server socket to
>> the read hook.  I'd expect the infrastructure to do the `accept' itself
>> and pass the client socket to the read hook.
>
> It's the opaque "server socket object". Doing it this way has a two
> advantages:
>
>   1) Works with other socket architectures (zeromq, particularly).

I'm not familiar with that, but will take a look.

>   2) Allows the server to make its own implementation of keepalive
>   (or not).
>
> Particularly the latter is interesting -- the http implementation makes
> a big pollset (from (ice-9 poll), not yet documented), and polls on the
> server socket and all the keepalive sockets.

As does (tekuti mod-lisp) - so the duplication is a slight shame.  But I
agree that there's no reason why the work to unduplicate that should be
in (web server).

Regards,
        Neil

>From fe431700a6c3ddf3fb26fce9607b9c7866968949 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 19:34:01 +0000
Subject: [PATCH] Fix typos in (web ...) doc

* doc/ref/web.texi (Types and the Web): "help" -> "helpful".
  (HTTP): Add closing paren.  Remove code that looks like a leftover.
---
 doc/ref/web.texi |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 3c7e0cd..76aa510 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -59,8 +59,8 @@ valid dates.  Error handling for a number of basic cases, like invalid
 dates, occurs on the boundary in which we produce a SRFI 19 date record
 from other types, like strings.
 
-With regards to the web, data types are help in the two broad phases of
-HTTP messages: parsing and generation.
+With regards to the web, data types are helpful in the two broad phases
+of HTTP messages: parsing and generation.
 
 Consider a server, which has to parse a request, and produce a response.
 Guile will parse the request into an HTTP request object
@@ -339,7 +339,7 @@ For example:
 
 (string->header "FOO")
 @result{} foo
-(header->string 'foo
+(header->string 'foo)
 @result{} "Foo"
 @end example
 
@@ -387,12 +387,6 @@ leaving it as a string.  You could register this header with Guile's
 HTTP stack like this:
 
 @example
-(define (parse-ip str)
-  (inet-aton str)
-(define (validate-ip ip)
-(define (write-ip ip port)
-  (display (inet-ntoa ip) port))
-
 (declare-header! "X-Client-Address"
   (lambda (str)
     (inet-aton str))
-- 
1.7.1

>From 072f3606be11e1e00542261017be531d8f4ade9b Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:31:30 +0000
Subject: [PATCH 1/6] Export server impl so that applications can use it

---
 tekuti/mod-lisp.scm |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 59002b7..c302f58 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -33,7 +33,8 @@
   #:use-module (web http)
   #:use-module (web request)
   #:use-module (web response)
-  #:use-module (web server))
+  #:use-module (web server)
+  #:export (mod-lisp))
 
 
 ;;; FIXME: ignore SIGPIPE, otherwise apache dying will kill us
-- 
1.7.1

>From 00b99fec62f8999f4ae151f8d71bc0f66de63a00 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:33:44 +0000
Subject: [PATCH 2/6] Update body-related calls to new API

---
 tekuti/mod-lisp.scm |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index c302f58..f644643 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -249,7 +249,7 @@
                  (let ((req (read-request/mod-lisp port)))
                    (values port
                            req
-                           (read-request-body/latin-1 req))))
+                           (read-request-body req))))
                (lambda (k . args)
                  (false-if-exception (close-port port)))))))))))))
 
@@ -259,9 +259,9 @@
     (cond
      ((not body))                       ; pass
      ((string? body)
-      (write-response-body/latin-1 response body))
+      (write-response-body response (string->utf8 body)))
      ((bytevector? body)
-      (write-response-body/bytevector response body))
+      (write-response-body response body))
      (else
       (error "Expected a string or bytevector for body" body)))
     (close-port (response-port response))))
-- 
1.7.1

>From 32b9f7252de2d73681a6df47fe19fcb361e9d3a1 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:34:10 +0000
Subject: [PATCH 3/6] Avoid using lookup-header-decl, which isn't exported

---
 tekuti/mod-lisp.scm |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index f644643..3a8b254 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -136,12 +136,11 @@
       (begin
         (display name port) (newline port)
         (display val port) (newline port))
-      (let ((decl (lookup-header-decl name)))
-        (if (not decl)
-            (error "Unknown header" name)
-            (begin
-              (display (header-decl-name decl) port) (newline port)
-              ((header-decl-writer decl) val port) (newline port))))))
+      (if (not (known-header? name))
+	  (error "Unknown header" name)
+	  (begin
+	    (display (header->string name) port) (newline port)
+	    ((header-writer name) val port) (newline port)))))
 
 (define (write-response-line/mod-lisp code phrase port)
   (write-header/mod-lisp "Status"
-- 
1.7.1

>From beb163df99803dcebe8b7fcf0fb4481a1a0f7f8b Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 16:34:40 +0000
Subject: [PATCH 4/6] Update to new parse-header signature, which only returns one value

---
 tekuti/mod-lisp.scm |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 3a8b254..f3e36a4 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -112,9 +112,10 @@
                     (acons (car sym-and-parser)
                            ((cdr sym-and-parser) v)
                            meta))
-                (call-with-values (lambda () (parse-header k v))
-                  (lambda (k v)
-                    (lp (acons k v headers) meta)))))))))
+		(lp (acons (string->header k)
+			   (parse-header (string->header k) v)
+			   headers)
+		    meta)))))))
 
 (define (read-request/mod-lisp port)
   ;; See the note in (web request) regarding chars, bytes, and strings
-- 
1.7.1

>From f94c51e3e73e96b9f729af38a7ee98b193eaf9c5 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 17:14:26 +0000
Subject: [PATCH 5/6] Update to new build-request signature

---
 tekuti/mod-lisp.scm |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index f3e36a4..00a2921 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -123,9 +123,8 @@
   (set-port-encoding! port "ISO-8859-1")
   (call-with-values (lambda () (read-headers/mod-lisp port))
     (lambda (headers meta)
-      (build-request
+      (build-request (assq-ref meta 'url)
        #:method (assq-ref meta 'method)
-       #:uri (assq-ref meta 'url)
        #:version (assq-ref meta 'server-protocol)
        #:headers headers
        #:meta meta
-- 
1.7.1

>From 7408a37c93caf6880302814058d2d445cd121891 Mon Sep 17 00:00:00 2001
From: Neil Jerram <[email protected]>
Date: Fri, 21 Jan 2011 17:15:15 +0000
Subject: [PATCH 6/6] Don't validate headers, as we get an apparently invalid Accept header
 from apache2/mod-lisp

---
 tekuti/mod-lisp.scm |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 00a2921..18a5649 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -128,7 +128,8 @@
        #:version (assq-ref meta 'server-protocol)
        #:headers headers
        #:meta meta
-       #:port port))))
+       #:port port
+       #:validate-headers? #f))))
 
 (define (write-header/mod-lisp name val port)
   (if (string? name)
-- 
1.7.1

Re: documentation for (web ...)

Reply via email to