Re: [Felix-language] [felix] Array sorts

john skaller Sun, 09 Dec 2012 19:35:00 -0800

On 10/12/2012, at 10:57 AM, srean wrote:

> 
> So far D seems to me a great set of compromises for my use case.


D has some nice improvements over C++, but not enough
to both, especially losing the advantages C++ does offer:
standardisation, large community, etc.

Pity D stuck with the stupid OO paradigm.

>  I will readily believe it as I have no understanding of these things at all.

The oft cited example of dependent typing (Wikipedia) is value of 
type R^N, aka finite sequences of reals. Here the N is not known at compile 
time.

An example of the typing would be a value x in R^N.
Suppose you map it to the same values doubled. 
Or you zip it with a copy of itself.

In both cases it is clear that both values have the same length N,
even though we don't know what N is.

That's dependent typing. Static assurance these operations
cannot fail, even though the actual length isn't known.
With those assurances you also get free optimisation:
there's no need for a run time check.

> But I was hoping that shape change could be done in a statically checkable 
> way.

It can and is. The question is which changes should be allowed?
At the moment Felix allows certain changes without warnings,
others get warnings (and some just fail). 

For example you can do:

        var x : (int ^ 3) ^ 2= ( (1,2,3), (4,5,6));
        var y = (x :>> int ^ ( 3 * 2));

which changes an array of two arrays of 3 ints into an
array of 3 * 2 ints (i.e. a matrix). But should we also allow

        var z = (x:>> int ^ (2 * 3));

I mean that fits. So does a 2 x 2 matrix. It's all just re-indexing.
however only the first coercion is structure preserving.

Felix uses another (nasty) coercion to get polyadic arrays:

        3 * 2 --> int

which maps the index tuple to an integer, which is what
allows the array to be considered linear. The type 3 * 2
is called a compact linear type (by me) because the mapping
is compact, in fact:

        i + j * 3

is the formula (obvious!) Which ranges from 0 to 3 * 2 - 1 = 5.

BTW: Just so you *understand* in a programmatic way
the PROFOUND difference between

        (int ^ 3) ^ 2
and
        int ^ (3 * 2)

You index the first form with 

        (a . i) . j

There are two indicies. When looping, two loops.
But the second form is indexed like this:

        a . (i,j)

There is only ONE index. in particular:

        var index = i,j;
        a . index

will work. You may think the isomorphism here is trivial and that
is 100% exactly correct and precisely why it is so profound.
Isomorphisms are always trivial, because they preserve structure.
We have not changed out data, but we can now access it with
a single index instead of two so we have reduced the problem
to a single loop and a way of going through the values of the
index -- which is no longer a problem with arrays, and doesn't
involve the polymorphism of the the array data type.

The problem is reduced to

        How do we increment a value of type 3 * 2?
        How do we stop at the end?

and the answer is: if we *represent* 3 * 2 by "int"
and use the packing formula given above we can just
go from 0 through 5, and the int representation and the
compact linear type representation are the same.

<aside>
Note: identical. At the representation level. Isomorphic
at the program level. These two things lead to extreme
power when they line up with each other. The art
of compiler design is in identifying important isomorphisms
and ensuring that the isomorphic data types are have identical
representations, or at least ones that can be cheaply converted.

By far THE most important isomorphism ever, is the representation
of values in FPLs by pointers. This ONLY works in a purely
functional language, it is called boxing. If everything is immutable
a value and a pointer to it cannot be distinguished. however
using pointers for all values means everything is the same size.
So you can write polymorphic functions that work with any type
and compile down to machine code.
</aside>


Using a coercion is very general. There are other ways,
for example:

        undim 

could be used to do this, and would only apply to an array
of arrays, producing an array with a 2 component multi-index.
To do it again you'd say 

        undim (undim a)

you could also have

        addim

which adds a dimension (size 1).

It's not entirely obvious especially when you consider
this rule:

        int ^ 3 * int ^2 ==> int ^ (3 + 2)

Felix can do that too. It looks confusing but 
the first type is trivial:

        ( (1,2,3), (4,5) )

Its just a pair of arrays. The second type is a linear array
with an multi-index. Only this index goes like:

        which array do you want?
        first array: 
                element i --> position i
        second array:
                element i --> position 3 + i

felix can do that too. Its a really nice way to join two
arrays. You need to understand sum types to follow
completely. Of course 2, 3 are sum types.

You already know sum types: they're just enumerations.
And again: the type

        2

is well known. It is usually just called

        bool

There is a coercion:

        2 + 3 => 5

This again is a compact linear type thing. Not that 2 + 3 is NOT
equal to 5. Just as 2 * 3 is not equal to 6: they're isomorphic
but not equal.

But Felix does a trick. It uses the same representation for these
things. So the coercion is just a type cast (no run time penalty).

> Indeed, but I will not brush aside indexing/slicing so easily.

I'm not brushing it aside. I'm saying you can already do it
semantically. The issue is to find a suitably general notation.

Subranges of integers 0:10 or whatever are obvious candidates.

However adding a stride is very problematic. There are two reasons
I feel this:

(a) just  a gut feeling but its hacky. Its adding a bit more power,
but not enough to bother. A more general comprehension
makes more sense.

(b) Strides in arrays throw out contiguous storage.


> Thats one clear advantage that numpy has over roll your own C++. Indexing 
> expressions, together with Broadcasting in a single line can remove the the 
> accidental complexity and verbosity of several degrees of nested looping. 
> Loops are tedious to type, easy to get wrong and hard to comprehend by just 
> eyeballing them from 4 feet.

Yes, but remember the problem doesn't exist to the same
extent with HOFs (higher order functions): folds, maps, filters,
etc etc.


>  Example
> 
> user_movie_ratings[  users[i],  movies[j]    ] = 4 
> 
> users[i] is an array of indices for users of class i
> movie[i] is an array of indices for movies of class j

        for user in users[i] do
        for movie in moves[j] do
                rating . movie .  user = 4
        done done

is certainly more verbose, however the loops are implicit. What's more,
this works for ANY streamable data type users[i]. It doesn't have to be
an array. It could be a list.

The implicit indexing form has some risks by comparison.

I don't have time to explain in detail here, but roughly:
there's a reason you write

        map sin (1.0, 0.5. 3.2)

instead of just

        sin (1.0, 0.5, 3.2)

with an implicit map: what happens if you overloaded sin to
handle a 3 element vector?

I am finding that too much sugar is a liability .. Felix has a lot I threw
together into the kitchen sink, and the result has some surprises.
I'm probably going to remove some of it.

>  
> Specific functions, eg indexing triangular matrices dynamically,
> will have to await use cases (because there's too many and we
> need at least one use case to test with).
> 
> Yeah this can wait. It halves the memory requirement, but access patterns get 
> shot, so its not always a win.

You can easily implement triangular matrix values. The problem
is that element wise mutation becomes nonsense.
The get and set matrix methods over (i,j) mandate independence.
I.e. these are projection functions. But set (i,j) affects get (j,i).
So we have broken the core invariant.

Of course you can fix this:

        set_pair (i,j,v) ==> set (i,j, v); set(j,i,v)

and now we're back in business, but we no longer
have any higher level setters.

BTW: triangular matrices are the perfect example of why OO is a fraud.
Clearly they're a subtype mathematically, but maths is purely 
constructive. As soon as you get into mutation, you no longer
have a subtype. In general, subtyping and mutation cannot
co-exist. OO intrinsically requires both.

> 
> explicit or "inferred to be sound" coercion is I positively desirable. I dont 
> want to accidentally reshape to something I did not want.

Sure, but exactly what do you want?

No, don't answer, its a rhetorical question :)

You want all sorts of weird shit. 
I know! You're a programmer :)

The question becomes: how do you SPECIFY what you want?

There is more than one kind of shape changing, adaption,
blah blah and the hard bit is not implementing it (ignoring
performance for the moment), but just devising nice syntax!

We have a limited number of punctuation marks and a lot of
related but distinct concepts to model. If we have too many
symbols or patterns people will get lost and if not enough
code is verbose and unreadable and people get lost.

the happy medium is dang hard to find!


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Re: [Felix-language] [felix] Array sorts

Reply via email to