Re: [racket-dev] consistency in names and signatures

2012-03-27 Thread Neil Van Dyke

FWIW...

* I have no strong opinion on whether it would be worthwhile, if done in 
a backward-compatible way.


* If done in a *non*-backward-compatible way, it might be a headache.  I 
know of systems in production with millions of lines of PLT/Racket code, 
and -- although PLT/Racket have been pretty good about backward 
compatibility -- it seems like every little non-backward-compatible 
change to a PLT/Racket version, my big clients feel it significantly.  I 
make a little money every time a platform change inflicts pain, since I 
have to fix it, but it's a net loss for me when goodwill for the 
platform is eroded.  (And perhaps eroded goodwill for me, who is 
implicitly endorsing the platform, and who has sometimes been asked 
directly to explain *why* such-and-such changed happened.  I would 
rather be paid to invent and build new stuff, not be responding to the 
platform breaking.)


* I am sympathetic to the idea of being more explicit about types in 
identifiers.  In nontrivial code, I do sometimes end an identifier as 
"-or-false" or "-or-f", and sometimes I have "/error" or "/exn" variants 
of procedures.  It helps me keep track of whether the value can be #f.  
I usually avoid being this explicit in identifiers in APIs, because it's 
a little ugly-looking, it has not been idiomatic Racket thus far, and it 
hasn't seemed necessary.


* If we're going to have exception-raising and #f-producing variants of 
a procedure, how about accommodating both the little language and big 
language people by having *three* variants: "/exn" and "/f" (or 
"/false") for the big language people who want to be explicit, and 
no-suffix for the littler language people who don't need or want all 
that clutter.


* Would this new world of naming conventions be a good time to replace 
the somewhat clunky-looking "->" naming convention with ">" or something 
else?  "number>string"? "number-to-string"?  "number-as-string"?  (No 
non-ASCII, unless I can get an APL keyboard for my ThinkPad.)


* Maybe we should consider otherwise simplifying some of these 
identifiers.  To use an example, "bytes->string/utf-8" is already a 
mouthful for a pretty common thing, even before we start adding suffixes 
onto it.  ("bytes->string/utf-8" might be too easy an example, since 
UTF-8 encoding would be an appropriate default for a "bytes->string" 
nowadays, and consistent with Racket's current behavior when writing a 
string to a bytes port.)


* Will there be more consistency in how "/" in an identifier should be 
read?  It seems that "X/Y" sometimes reads as "X with behavior Y", 
sometimes as "X with a Y argument", sometimes as "X or Y", and sometimes 
as something else.


Neil V.

--
http://www.neilvandyke.org/

_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-27 Thread Matthias Felleisen

On Mar 27, 2012, at 8:40 PM, Neil Van Dyke wrote:

> we should consider otherwise simplifying some of these identifiers.


That's very high on my list of goals for another, related experiment I have in 
mind. I know that this would remove the title of 'world champion in longest 
identifier names' from our community, but I am willing to sacrifice this for 
having more concise programs :-) 
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-28 Thread Andy Gocke
Putting aside the 8 (yeah, really) ways to report errors in Haskell, this
is the option provided by the Maybe (data Maybe a = Something a | Nothing).
While I see many benefits to this approach, I think contracts may provide a
new way out. In most typed languages the major constraint seems to be that
types are checked at compile time and limited in complexity. Contracts,
however, have practically unlimited power. The trick is that one would have
to incorporate some way to reify the calculation checked by the contract.
If you could get a hold of the value produced by contract checking then no
duplicate computation would have to occur inside the function. This is
especially relevant for functions like string->number because the most
obvious implementation checks validity during parsing -- checking the
validity and parsing basically duplicate the function.

Also, if one sees a sort of "inner" function post-contract-check and an
outer function including the contract check then one could even term the
inner function as total and the outer function as partial.

The advantages of this approach, as far as I can see, are that it removes
large amounts of failure checking and it encourages large amounts of
precondition code to move into the contract. Not only is this good for
documentation and interfaces, but it helps with my current project (random
testing through contracts).

2c

Cheers,
Andy

On Tue, Mar 27, 2012 at 5:23 PM, Matthias Felleisen wrote:

>
> Bug report 12652 reminded me of a topic that I brought up a while back,
> that I tried to incorporate into the Style Guide, and that I forgot to
> re-introduce here.
>
> Background: a lot of people think that consistency in naming,
> signature/contract, and functionality (for methods and functions) is a key
> element to successful software projects. If you saw Yaron Minsky's talk at
> POPL or if you are in a department where he delivered his OCAML is great
> for trading talk, you know what I mean. He formulates this point well, and
> he gives good examples.
>
> Topic: In our world, we have functions such as
>
>  string->number
>  string->path
>  string->url
>  bytes->string/utf-8
>
> The naming consistency is good, but they aren't really consistent at the
> signature or functionality level:
>
>  string->number produces #f when called on "hello world" or "\0"
>  string->path fails on "\0"
>  string->url succeeds on "\0" and produces a url
>
> I consider this less than desirable. I understand arguments for #f and
> exceptional behavior in an ML-style world. In a Racket/Lisp style world, I
> see the behavior of string->number as ideal. I get two behaviors in one
> function:
>
>  (1) parsing in the spirit of formal languages (is this 'string' accepted
> by this 'machine')
>  (2) translation in the case of success.
>
> One advantage of such total functions is of course that they are
> performant. The signatures/contracts are simple and their functionality is
> easy to figure out. A disadvantage is that they deepen our dependence on
> occurrence typing, but so what.
>
> I could also understand providing two versions of the function:
>
>  string->path : String -> Path u False
>  string->path/exn: String -> Path | effect: exn:fail:contract?
>
> Q: Would it be worth our while to comb through the libraries and make the
> world consistent, even breaking backwards compatibility? I would be willing
> to run such a project.
>
> -- Matthias
>
>
>
>
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-28 Thread Eli Barzilay
Yesterday, Andy Gocke wrote:
> [...]  This is especially relevant for functions like string->number
> because the most obvious implementation checks validity during
> parsing -- checking the validity and parsing basically duplicate the
> function.

And that makes most of my point.

The thing is that `string->url' is basically *just* a parser -- it
does very little after matching the regexp.  I therefore view the
commit as adding a contract to, say, `read-xml', where the contract
runs the function to see that the input is valid.

An even more extreme example would be `get-pure-port': if you really
want a complete specification of the domain in a contract, then the
contract should make sure that the server is reachable, and that it
returns a valid page.  Combine this with parsing the page, and how
this is not really a great way to run code (ATM!) should be clear.
Besides the issue of doing a bunch of work twice, the contract would
still be broken since having a valid server and/or a page now doesn't
mean that it's going to be valid on the next attempt.  To make this
practical, you'd need some way to expose values that are computed
as part of the contract.  ("Reify" feels wrong to me in this
context...)

That's why I added the above "ATM".  There is an obvious appeal in
doing this -- having all error handling in specific pieces of code and
"floating" them upwards sounds tempting *if* there's some way to do it
right.  I suspect that such an exposure of the contract results is
just one small step in getting this.  I'm also not sure that it's
doable in a way that actually leads to a practical benefit.  This is
similar to me doubting the theoretical utility in running a parser
twice: on one level you get your guaranteed, nicely total function,
but on the level of providing that guarantee, you get the original
problem.  (And in terms that I'm used to, this is switching the same
work to your well-formedness goal, and that buys nothing in terms of
getting things done.)

IMO, this problem is fundamental enough that it shows up in many
contexts.  One of them is already visible in the `string->url'
example.  The new documentation reads:

  | url-regexp : regexp?
  |
  |   This is a regular expression based on the one in Appendix B of
  |   RFC 3986 for recognizing urls.  This is the precise regexp:
  |
  |   ^(?:([^:/?#]*):)?(?://(?:([^/?#@]*)@)?([^/?#:]*)?(?::([0-9]*))?)
  |   ?([^?#]*)(?:\?([^#]*))?(?:#(.*))?$

(Pre-disclaimer: the following is not said in a negative way.)

At least in my view, this documentation is is useless.  It's true that
it's precise, but as a user of this code, I get nothing out of it.  I
can't even *use* that regexp (the one quoted in the docs) since it
looks like something that can easily change, so I better use the
`url-regexp' binding and not the quoted regexp.

But the deeper reason that this is not useful to me is that it
essentially spells out the parser code -- and documenting a function
using its own code is (IMO) often a sign that the abstraction is
questionable.


But there are a few additional problems with this change that I see:

* Beyond quoting it in the documentation, exposing the regexp means
  that it becomes part of the interface.  This means that I now cannot
  re-implement the code in any way other than matching a regexp.

* It is still partial.  For example, this

-> (string->url "1:/")
; Invalid URL string; bad scheme "1": "1:/"

  is still not a contract error.  (And I can't see an obvious way to
  add it to the regexp, maybe with some lookahead tricks.)

  Another example is the host part, which is not even checked, but
  this is just sloppiness (= deferring it to network errors that will
  happen with malformed hosts).  And BTW, doing that means that the
  contract becomes platform dependent:

-> (file-url-path-convention-type 'unix)
-> (url-host (string->url "file://x:&x/baz"))
"x"
-> (file-url-path-convention-type 'windows)
-> (url-host (string->url "file://x:&x/baz"))
""

* More importantly, and possibly related to the first bullet, it
  stands in the way of improving this code.  There is a major problem
  in the design of the code -- it parses all urls as `http'.  A proper
  way to deal with it is to choose a specific parser based on the
  schema.  For example, as it looks now, I can't change it to properly
  treat "mailto:..."; urls.

  That's not theoretical -- I planned on doing that extension, and now
  it is impossible to do it in a nice way.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-28 Thread Matthias Felleisen


This post is minimally related to the topic of this thread. 
If you continue discussing this idea, please move it elsewhere. 

(I happen to agree with Eli's perspective here, which is why I 
started this particular thread. Racket makes it particularly 
convenient to avoid contracts here and Typed Racket makes it even
possible to type check such things AND to assign meaning to #f
results beyond the function body. BUT this is barely related to
my question.) 



On Mar 28, 2012, at 9:34 AM, Eli Barzilay wrote:

> Yesterday, Andy Gocke wrote:
>> [...]  This is especially relevant for functions like string->number
>> because the most obvious implementation checks validity during
>> parsing -- checking the validity and parsing basically duplicate the
>> function.
> 
> And that makes most of my point.
> 
> The thing is that `string->url' is basically *just* a parser -- it
> does very little after matching the regexp.  I therefore view the
> commit as adding a contract to, say, `read-xml', where the contract
> runs the function to see that the input is valid.
> 
> An even more extreme example would be `get-pure-port': if you really
> want a complete specification of the domain in a contract, then the
> contract should make sure that the server is reachable, and that it
> returns a valid page.  Combine this with parsing the page, and how
> this is not really a great way to run code (ATM!) should be clear.
> Besides the issue of doing a bunch of work twice, the contract would
> still be broken since having a valid server and/or a page now doesn't
> mean that it's going to be valid on the next attempt.  To make this
> practical, you'd need some way to expose values that are computed
> as part of the contract.  ("Reify" feels wrong to me in this
> context...)
> 
> That's why I added the above "ATM".  There is an obvious appeal in
> doing this -- having all error handling in specific pieces of code and
> "floating" them upwards sounds tempting *if* there's some way to do it
> right.  I suspect that such an exposure of the contract results is
> just one small step in getting this.  I'm also not sure that it's
> doable in a way that actually leads to a practical benefit.  This is
> similar to me doubting the theoretical utility in running a parser
> twice: on one level you get your guaranteed, nicely total function,
> but on the level of providing that guarantee, you get the original
> problem.  (And in terms that I'm used to, this is switching the same
> work to your well-formedness goal, and that buys nothing in terms of
> getting things done.)
> 
> IMO, this problem is fundamental enough that it shows up in many
> contexts.  One of them is already visible in the `string->url'
> example.  The new documentation reads:
> 
>  | url-regexp : regexp?
>  |
>  |   This is a regular expression based on the one in Appendix B of
>  |   RFC 3986 for recognizing urls.  This is the precise regexp:
>  |
>  |   ^(?:([^:/?#]*):)?(?://(?:([^/?#@]*)@)?([^/?#:]*)?(?::([0-9]*))?)
>  |   ?([^?#]*)(?:\?([^#]*))?(?:#(.*))?$
> 
> (Pre-disclaimer: the following is not said in a negative way.)
> 
> At least in my view, this documentation is is useless.  It's true that
> it's precise, but as a user of this code, I get nothing out of it.  I
> can't even *use* that regexp (the one quoted in the docs) since it
> looks like something that can easily change, so I better use the
> `url-regexp' binding and not the quoted regexp.
> 
> But the deeper reason that this is not useful to me is that it
> essentially spells out the parser code -- and documenting a function
> using its own code is (IMO) often a sign that the abstraction is
> questionable.
> 
> 
> But there are a few additional problems with this change that I see:
> 
> * Beyond quoting it in the documentation, exposing the regexp means
>  that it becomes part of the interface.  This means that I now cannot
>  re-implement the code in any way other than matching a regexp.
> 
> * It is still partial.  For example, this
> 
>-> (string->url "1:/")
>; Invalid URL string; bad scheme "1": "1:/"
> 
>  is still not a contract error.  (And I can't see an obvious way to
>  add it to the regexp, maybe with some lookahead tricks.)
> 
>  Another example is the host part, which is not even checked, but
>  this is just sloppiness (= deferring it to network errors that will
>  happen with malformed hosts).  And BTW, doing that means that the
>  contract becomes platform dependent:
> 
>-> (file-url-path-convention-type 'unix)
>-> (url-host (string->url "file://x:&x/baz"))
>"x"
>-> (file-url-path-convention-type 'windows)
>-> (url-host (string->url "file://x:&x/baz"))
>""
> 
> * More importantly, and possibly related to the first bullet, it
>  stands in the way of improving this code.  There is a major problem
>  in the design of the code -- it parses all urls as `http'.  A proper
>  way to deal with it is to choose a specific parser based on the
>  schema.  For example, as it lo

Re: [racket-dev] consistency in names and signatures

2012-03-28 Thread Jay McCarthy
On 3/27/12, Matthias Felleisen  wrote:
> Q: Would it be worth our while to comb through the libraries and make the
> world consistent, even breaking backwards compatibility? I would be willing
> to run such a project.

I think it would be good. I think we should do it as a new #lang to
preserve backwards compatibility.

My preference is to almost always throw exceptions and almost never
return false unless the function name as some convention to it. (I use
a * to indicate a function that propagates #f, a lot.)

I see a solution to the string->url to be that the function should
just throw a contract violation exn internally without specifying the
contract outside. You could document it as having some
valid-url-string? that called string->url, but the actual contract
applied to string->url would just be string?

Jay

-- 
Jay McCarthy 
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

"The glory of God is Intelligence" - D&C 93
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-28 Thread Andy Gocke
On Wed, Mar 28, 2012 at 1:24 PM, Jay McCarthy wrote:

> I see a solution to the string->url to be that the function should
> just throw a contract violation exn internally without specifying the
> contract outside. You could document it as having some
> valid-url-string? that called string->url, but the actual contract
> applied to string->url would just be string?
>

I don't want to focus too much on string->url and I think this discussion
has moved past my level of knowledge, but I would like to describe my
thesis a little bit to give people a sense of where I'm coming from.

Right now the purpose of my thesis is to automatically load modules, grab
contracts, generate values associated with those contracts, and try to
break those functions with generated and stored values. This is based off
of Casey Klein's OOPSLA paper. If the contract for string->url contains a
regex in the domain one could easily see writing a random generator for
regular expressions and thus automatically testing a wide variety of
regex-filtered contracts.

If the contract is simply string? that doesn't mean I can't test
string->url, but it does mean that I can't do it automatically and would
need to create a specialized generator for all such cases. I'm not sure if
this affects the decision of what's best for Racket as a whole, but it may
be food for thought.

Andy
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-28 Thread David T. Pierson
On Tue, Mar 27, 2012 at 06:23:16PM -0400, Matthias Felleisen wrote:
> The naming consistency is good, but they aren't really consistent at
> the signature or functionality level: 
> 
>  string->number produces #f when called on "hello world" or "\0"
>  string->path fails on "\0"
>  string->url succeeds on "\0" and produces a url 

...

> Q: Would it be worth our while to comb through the libraries and make
> the world consistent, even breaking backwards compatibility? I would
> be willing to run such a project. 

I think you are asking whether an effort should be made to change
existing function names so they better indicate the function signature.

Besides the obvious backward-compatibility concerns, such an effort
seems likely to be at cross-purposes with the separately stated goal of
reducing the length of identifiers and making programs more concise.
(Presumably if equally concise names that better reflected function
signatures were available, they would have been used in the first
place.)

Though I haven't heard the talks you referenced, I agree that naming is
very important.  As you mention, the naming consistency in
Racket is already good.  (The quality of the function names is probably
one of the things that drew me to Scheme.)

The examples you mention point out problems with inputs that might be
considered "edge cases".  One might argue that these are the cases where
it would be helpful for the name to give more information.  On the other
hand, such information might be harmful in that it reduces the
abstraction level and increases the information that a developer reading
the code must process.  Basically, at some point, more information in
the name is worse.

Without suggestions of proposed name changes, I find it difficult to
imagine names that are so much better in the area of expressing function
signatures that they make up for these drawbacks.  (non-compatibility,
less abstract, longer, etc.)

In addition, what alternative project would be postponed in order to run
this one?

David

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-03-29 Thread David T. Pierson
On Thu, Mar 29, 2012 at 12:44:35AM -0400, David T. Pierson wrote:
> (Presumably if equally concise names that better reflected function
> signatures were available, they would have been used in the first
> place.)

Sorry for the double post.  I should have added "equally lucid" along
with "equally concise".

Perhaps what I should have asked was simply whether there exist names
that better indicate function signatures but are still good in all or
most other important aspects, and whether it is worth breaking
compatibility for such.

David
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] consistency in names and signatures

2012-04-04 Thread ozzloy-racket-dev
along the consistency in function naming vein:
file-name-from-path versus filename-extension.  is "filename" 1 word or 2?
 i prefer 1.

even more tangential, why isn't file-name-from-path "path->filename"
instead?  or even "basename"?

On Thu, Mar 29, 2012 at 08:07, David T. Pierson  wrote:

> On Thu, Mar 29, 2012 at 12:44:35AM -0400, David T. Pierson wrote:
> > (Presumably if equally concise names that better reflected function
> > signatures were available, they would have been used in the first
> > place.)
>
> Sorry for the double post.  I should have added "equally lucid" along
> with "equally concise".
>
> Perhaps what I should have asked was simply whether there exist names
> that better indicate function signatures but are still good in all or
> most other important aspects, and whether it is worth breaking
> compatibility for such.
>
> David
> _
>  Racket Developers list:
>  http://lists.racket-lang.org/dev
>
_
  Racket Developers list:
  http://lists.racket-lang.org/dev