On Sun, 13 Oct 2002 17:19:58 +0200 (EET) Nerijus Baliunas <[EMAIL PROTECTED]> wrote:

> Hi,

> "!" shouldn't be part of url:

>> Go take the poll at http://www.clanlib.org!

You are right.

RFC1738 says:

httpurl        = "http://"; hostport [ "/" hpath [ "?" search ]]
hostport       = host [ ":" port ]
host           = hostname | hostnumber
hostname       = *[ domainlabel "." ] toplabel
domainlabel    = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel       = alpha | alpha *[ alphadigit | "-" ] alphadigit
alphadigit     = alpha | digit
hostnumber     = digits "." digits "." digits "." digits
port           = digits


alpha          = lowalpha | hialpha
digit          = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"
lowalpha       = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                 "y" | "z"
hialpha        = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

Unfortunately, we do not have a real URL parser: we only look for
"http://";, and then accept any char that could be part of a URL, not
taking into account the fact that some of them (like '!') can only
appear in some specific parts of the URL.

Does someone know of an URL parsing code, somewhere ?

Note that this would solve this particular problem, but that
  http://www.foo.bar/baz!
is a valid URL, as it would be with ',' or '.' and the end.
This one is also valid
  http://www.foo.bar/!


hpath          = hsegment *[ "/" hsegment ]
hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]

uchar          = unreserved | escape
unreserved     = alpha | digit | safe | extra

extra          = "!" | "*" | "'" | "(" | ")" | ","
safe           = "$" | "-" | "_" | "." | "+"


hex            = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                 "a" | "b" | "c" | "d" | "e" | "f"
escape         = "%" hex hex


We can do our best to correctly parse URLs, but I guess that the only
safe way depends on the sender: use < and > around the URL.

-- 
Xavier Nodet
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin, 1759.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Mahogany-Developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mahogany-developers

Reply via email to