[rust-dev] Hello

john skaller Sat, 14 May 2011 08:01:24 -0700

Hello rust devs.

I have been looking at rust and I'm sure I don't fully understand either the 
language
or design goals yet, but I am disturbed by multi-argument functions. Please 
correct
any misconceptions in the following argument! I doubt this will change anything 
but
I feel I should present the case anyhow.


Functions in Rust have a type like

        fn (int,int)->int

instead of

        tup(int,int)->int

which is what they should have. The reason appears to be to support aliases,
which I understand are a special pointer kind which which are guaranteed to 
point
point down stack. This achieves efficiency when passing large objects, whilst 
at the
same time assuring the pointers are weak, and so need no memory management
other than the normal stack operations, again an efficiency issue. Forgive me 
is the
description is inaccurate!

Ok, so the problem with multi-argument functions without a domain type is it
wreaks utter havoc with the type system. Instead of a simple combinatorial 
system
which can be made polymorphic by "simply chucking in type variables" we have
a complex spaghetti which ends up being entirely untenable.

To see this you only have to look at C++! In particular look at the weak 
attempt to
get around this in C++ 201x by allowing templates to processes variable length
argument lists recursively.

To see the extreme impact of this choice consider how you'd implement a 
composition operator. In Felix:

fun compose[A,B,C] (f:A->B, g:B->C) => fun (x:A) => g ( f (x));

Similar in ML, Ocaml, Haskell, and indeed in C++ and Rust for one
argument functions.. and therein lies the problem.

Certainly in Rust if you have two argument functions returning a tuple
type you can ALSO define composition by unpacking the tuple returned
by the first and applying the second function to it. And three argument
functions.

So you need a whole family of functions, just to define composition.

This problem is pervasive: it doesn't just apply to composition,
it applies to all higher order functions. You can in fact see the effect
in the C++ Standard library where there are combinators for one
and two argument functions and then the library just gives up.

Note this problem isn't merely a linear expansion in the size of a polymorphic
library: the effects combine, the explosion is exponential. Roughly it makes
polymorphism completely untenable.

Polymorphism is the central vehicle to provide safe reusable components.
No language should be released with coherent polymorphism.
Go doesn't have it, so its useless. Java is a rubbish, because it was released
without it. C++ at least was standardised with polymorphism (even if it isn't
wonderful it is the basis of the standard library).

In my view, polymorphism is no longer a nice toy for academics. At least first 
order
polymorphism is mandatory. Multi-argument functions have to be thrown out.

So the problem is: how to achieve efficiency of a similar to kind to what rust 
aliases
provide. Note that these are similar to C++ references with a prohibition on 
address
taking.

Its quite clear the parameter passing performance is easily obtained by 
pointers,
and this fits into the model of functions with tuple arguments. But it isn't 
safe to just
pass pointers to the stack into a function for fear they'll be captured and 
stored 
somewhere that outlives the argument stack frame.

First, let me say that it is possible to detect with a good conservative 
approximation.
I know this because my Felix compiler actually does it: if the compiler can't 
prove it 
is safe it will use the heap instead of the stack.

However in Rust, the philosophy is not to automatically optimise things this 
way,
because the performance is then unclear. Instead in Rust the philosophy is to
annotate the code so the performance can be seen, and then the compiler ensures
the annotations hold (and errors out otherwise).

In my discussion on IRC Grayson appeared to believe this had to be done with a 
type
annotation, and therefore one would have to introduce aliases into the general
type system, which is not desired.

I have thought about this a bit and I tend to think now this is not entirely 
the case.
Let us assume instead:

(a) we have tuple typed function domain types
(b) we allow pointers in tuples and therefore to be passed to functions
(c) a function can use the existing alias annotation

Now what we do is: the alias annotation works exactly as it does now.
It can only be used in function declarations and has the same meaning.

However the TYPE is "pointer". So the caller can and must pass a pointer,
according to the type of the function. The callee guarantees that the pointer
isn't captured but this cannot be deduced from the type. 

So when you use an unknown function you have no assurance. But if the
function is known, then you can look at the definition and you get the 
assurance.

This is weaker than the current Rust rules, because it forces "heapification"
of arguments to unknown functions accepting a pointer. But it has the major
advantage of making general parametric polymorphism possible. Without
some kind of change like this it simply isn't, and in my view that makes
Rust a dead horse even before the race starts.

I should note that this problem only applies on ABI boundaries,
in Rust I think thats what you export/import from crates. Internal
(within crate) calls can be optimised heavily by inlining away many
calls.

Now, people should expect overheads using intensional polymorphism.
Clearly this excludes some type based optimisations. Clearly separate
compilation and linkage by dumb linkers excludes optimisations.
But then there are other choices: C++ makes templates visible to all
callers and uses extensional polymorphism and so it generates fast code
although it tends to bloat a bit as a consequence. Felix does this too but
it doesn't bloat (it unifies code which is identical except for pointer types).

 Ocaml forces the order
of compilation in order to pass information from compilation units forward
to allow inlining and other optimisations in at least one direction: an
extra restriction and a smarter (more complex) compilation/linkage process.

In Rust, you have expanded values but you can box them explicitly.
So it seems to make sense to me to allow both intensionally polymorphic
functions and extensional ones (explicit choice), The means passing more
information between crates to support higher performance if the programmer
chooses. Roughly you can either compile a polymorphic function that fits
into the ABI or just pass the text around if you want to trade compile time
performance for run time performance. [Of course I don't mean passing the
unicode text around literally!]

In particular passing the text around allows the safety aliases give
without sacrificing performance, otherwise you lose the performance.

So I recommend considering this new tradeoff. You throw away some
performance in some circumstances (safety is not lost). In return you get
proper polymorphism you can't otherwise get and that is essential to
reusability and therefore the quality of the software written in the language.

When I started extending Felix (well over 10 years ago) to support polymorphism
I made the very difficult decision that the only way to do it was to assert
that *everything* was polymorphic .. its just that some functions happen
to have 0 type parameters :] Doing this forced the whole language and
type system to work polymorphically as an "axiom" not something added
on later.

I think this is essential. If you haven't got polymorphism working,
don't release the language at all. You may actually get users and then
you get stuck if you didn't design the type system right.

IMHO the type system at the moment isn't. I mean it isn't a type system at all.
All modern type systems are combinatorial. Rusts isn't, so it doesn't even count
as having a type system. The ability to make arbitrary combinations IS 
parametric
polymorphism.

In an "ideal" language you can prototype by separately compiling the pieces
and playing around. This is fast compilation but not so fast generated code.
Then when you're happy you expand the size of the units being compiled,
finally up to the whole program. That gives you the best performance at run 
time,
at the cost of a very long compile.

In the case of the design I presented above, this idea shows how you can
have your cake and eat it too. So if you want to get rid of "heapified" values
introduced into prototypes because information is lost .. just compile larger
units or the whole program. This won't get rid of the "unknown function"
in complex situations where you'd need data flow analysis to even make
an educated guess, but it does allow significant improvement.

I don't have any final answer to this problem, but throwing away combinatorial
typing and thus parametric polymorphism for a minor optimisation and safety
assurance doesn't make sense to me. There must be a better compromise!

--
john skaller
[email protected]




_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

[rust-dev] Hello

Reply via email to