At 12:03 15/05/05 +0200, Peter Simons wrote:

Graham Klyne writes:

 >> The longer I think about this whole thing, the more I am
 >> convinced that using URIs is the answer.

 > FWIW, the revised URI parsing code [2][3] in the latest
 > libraries includes support for IPv6 literals, as
 > specified by RFC 3986 [1].

Thanks for the pointer, Graham. I knew that the URI code had
been written a while ago, but never realized how extensive
the changes were! Great job.

Thanks!  (It's essentially a complete rewrite.)

Now the only problem is that the module doesn't expose the
functions we would need; such as Network.URI.host, for
instance. Would it be possible to factor those parser out
into a

  Text.ParserCombinators.Parsec.Rfc3986

This seems a reasonable idea.

module? Maybe we could even merge those parsers with the
ones I have here:

  http://cryp.to/hsemail/docs/index.html

RFC grammars are often very similar after all.

I think it could be useful to have a collection of RFC parsers along these lines. I'm not entirely sure what you mean my "merge" -- I think the RFC distinctions should be maintained.

One might also consider that my unit test code (see ../tests directory) contains some framework functions that might be used to create a test case library.

One thought: in some cases, my URI parser code depends on the URI data types that I declare (for the return values), so it might not separate as cleanly as one might like -- I think it would be confusing if the data type declarations were separated from the URI module. I suppose the parser combinators might return tuples that are assembled by the URI code.

Also, if separating the combinators, one might want to make the monadic parser type more general. Hmmm.... I did something like this for another bit of code somewhere, but I forget where. I think I made the parser polymorphic in the state value, which was not referenced. This way, the parsers can be referenced by other, more specific combinators that do use the state value. Currently, the state type is ().

P.S.: In the definition

  host = ipLiteral <|> try ipv4address <|> regName

it looks as if the 'try' modifier should be given for the
first alternative; not for the second. I may be wrong
though.

My initial (lame) answer is that it passes all the available test cases. More seriously, if you think there's something that breaks the current code then a test case should be created.

Looking at the production (copy below), the first case doesn't need a 'try' because if the initial character is a '[' then no other parse is possible. But for the ipv4literal production backtracking may be needed; consider:

   111.222.333.mydomain.org

#g
--

*****
Selected host productions:

[[
host :: URIParser String
host = ipLiteral <|> try ipv4address <|> regName

ipLiteral :: URIParser String
ipLiteral =
    do  { char '['
        ; ua <- ( ipv6address <|> ipvFuture )
        ; char ']'
        ; return $ "[" ++ ua ++ "]"
        }
    <?> "IP address literal"

 :

ipv4address :: URIParser String
ipv4address =
    do  { a1 <- decOctet ; char '.'
        ; a2 <- decOctet ; char '.'
        ; a3 <- decOctet ; char '.'
        ; a4 <- decOctet
        ; return $ a1++"."++a2++"."++a3++"."++a4
        }
]]


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to