Yesterday, Andy Gocke wrote: > [...] This is especially relevant for functions like string->number > because the most obvious implementation checks validity during > parsing -- checking the validity and parsing basically duplicate the > function.
And that makes most of my point. The thing is that `string->url' is basically *just* a parser -- it does very little after matching the regexp. I therefore view the commit as adding a contract to, say, `read-xml', where the contract runs the function to see that the input is valid. An even more extreme example would be `get-pure-port': if you really want a complete specification of the domain in a contract, then the contract should make sure that the server is reachable, and that it returns a valid page. Combine this with parsing the page, and how this is not really a great way to run code (ATM!) should be clear. Besides the issue of doing a bunch of work twice, the contract would still be broken since having a valid server and/or a page now doesn't mean that it's going to be valid on the next attempt. To make this practical, you'd need some way to expose values that are computed as part of the contract. ("Reify" feels wrong to me in this context...) That's why I added the above "ATM". There is an obvious appeal in doing this -- having all error handling in specific pieces of code and "floating" them upwards sounds tempting *if* there's some way to do it right. I suspect that such an exposure of the contract results is just one small step in getting this. I'm also not sure that it's doable in a way that actually leads to a practical benefit. This is similar to me doubting the theoretical utility in running a parser twice: on one level you get your guaranteed, nicely total function, but on the level of providing that guarantee, you get the original problem. (And in terms that I'm used to, this is switching the same work to your well-formedness goal, and that buys nothing in terms of getting things done.) IMO, this problem is fundamental enough that it shows up in many contexts. One of them is already visible in the `string->url' example. The new documentation reads: | url-regexp : regexp? | | This is a regular expression based on the one in Appendix B of | RFC 3986 for recognizing urls. This is the precise regexp: | | ^(?:([^:/?#]*):)?(?://(?:([^/?#@]*)@)?([^/?#:]*)?(?::([0-9]*))?) | ?([^?#]*)(?:\?([^#]*))?(?:#(.*))?$ (Pre-disclaimer: the following is not said in a negative way.) At least in my view, this documentation is is useless. It's true that it's precise, but as a user of this code, I get nothing out of it. I can't even *use* that regexp (the one quoted in the docs) since it looks like something that can easily change, so I better use the `url-regexp' binding and not the quoted regexp. But the deeper reason that this is not useful to me is that it essentially spells out the parser code -- and documenting a function using its own code is (IMO) often a sign that the abstraction is questionable. But there are a few additional problems with this change that I see: * Beyond quoting it in the documentation, exposing the regexp means that it becomes part of the interface. This means that I now cannot re-implement the code in any way other than matching a regexp. * It is still partial. For example, this -> (string->url "1:/") ; Invalid URL string; bad scheme "1": "1:/" is still not a contract error. (And I can't see an obvious way to add it to the regexp, maybe with some lookahead tricks.) Another example is the host part, which is not even checked, but this is just sloppiness (= deferring it to network errors that will happen with malformed hosts). And BTW, doing that means that the contract becomes platform dependent: -> (file-url-path-convention-type 'unix) -> (url-host (string->url "file://x:&x/baz")) "x" -> (file-url-path-convention-type 'windows) -> (url-host (string->url "file://x:&x/baz")) "" * More importantly, and possibly related to the first bullet, it stands in the way of improving this code. There is a major problem in the design of the code -- it parses all urls as `http'. A proper way to deal with it is to choose a specific parser based on the schema. For example, as it looks now, I can't change it to properly treat "mailto:..." urls. That's not theoretical -- I planned on doing that extension, and now it is impossible to do it in a nice way. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _________________________ Racket Developers list: http://lists.racket-lang.org/dev