Alastair Reid wrote:
Is [marshaling functions] something absolutely impossible in Haskell and by what reason? Just because of strong typing (forgive
my stupidity ;)? Or are there some deeper theoretical limitations?



The big theoretical issue is whether it would provide an Eq or Show instance for -> by the backdoor. Careful API design could avoid the
worst of this.

What's the problem with a Show instance for ->? (Or, put another way: what makes the current Show instance in the Prelude harmless compared to a fuller implementation?)

I'm not sure whether having an Eq instance for -> would be so bad.
General equality satisfies two constraints:
1. For all functions (including field access functions and whatnot),
"equal" inputs will return "equal" outputs.
2. True generality is the finest partitioning that satisfies (1).

(1) is a recursion (and a massive one), (2) specifies we want the least
fixed point of the recursion. Also, it's (2) that makes equality
undecidable.

(1) is essential and cannot be changed.

Would changing (2) be dangerous? What are the dangers?
For example, we could define -> equality to be source code equality,
modulo a well-defined set of semantics-preserving transformations (such
as translation to core language and renaming local variables - nothing
fancy, I'd consider f x = x to be different from f = \x x).

A semi-theoretical issue is to do with sharing of values, libraries and types. A simple function like '\ x -> x' is fairly easy to write
out because it doesn't refer to any other objects but what about:


\ x -> head [ y | (x',y) <- table, x == x' ]

This refers to an object 'table' that will have to be accessible from
the receiver. Choices are to access the object over the network, to
copy the object or to move the object and access it remotely from the sender.

The default semantics for marshalling such an object would be to copy 'table' along with the function. Essentially it's the responsibility of the programmer to make sure that he isn't inadvertently exporting the entire state of the computation. (Actually sometimes that's exactly what's wanted: dump out the entire state of the computation, say, as a checkpoint to safeguard against potential crashes.)

Remote access requires a network connection and suddenly adding two numbers can raise a TCP/IP exception.

Right, but, in principle, adding two numbers can generate a stack overflow or an out-of-memory error. There's nothing new here, except for failure probabilities.

Copying has consequences for mutable datatypes (a useful but non-standard extension of Haskell), for foreign datatypes (Ptr, etc.)
and for interfaces to foreign functions.

Indeed. These cannot be marshalled in any useful sense. (Which makes me more than a bit suspicious about the Fd instances of Show and Read, BTW.)

Copying also has consequences for space and time usage: each copy uses more space

That can be controlled by the programmer (provided that sharing is preserved over marshalling).


and laziness is lost.

Not at all. Simply marshall the thunks instead of the evaluated values. In other words, instances of marshalling functions must be non-strict.

Actually I didn't find strictness annotations on the Show instances in
the library docs. Is this an omission, intentionally left
implementation-dependent, or are they indeed lazy?

If an object is copied:

1) Is it evaluated first?

It shouldn't. Haskell is a lazy language, and forcing evaluation would make any objects that contain an infinite list deep down in their bowels potentially non-marshallable.

2) Can it contain mutable objects? 3) Can it contain Foreign pointers?

Not by my book, at least not by default. If would probably be possible to create instances that handle specific cases, e.g. if the Foreign pointer is just an implementation vehicle for an FtpFile object, then there might be specific marshalling code that marshals just the URL and rebuilds the connection on the receiving side. This also exemplifies the standard problem: Most foreign functions can fail, even unpredictably, so it's probably better to send just the specifications for building an equivalent FFI object across the network (which is exactly what the IO monad is good for: it contains action specifications, not the actions themselves).

4) Can it contain functions?

Of course - this is what got this thread started in the first place :-)


5) Is sharing preserved within the object? e.g., does let x=1; y=(x,x,x) in (y,y,y) get sent as 3 objects or as 13 objects?

Semantically, this is irrelevant. However, solutions to the next point will usually preserve sharing, so this is less of an issue.

6) Are cyclic objects ok? e.g., can 'let x=1:x in x' be sent?

They should be. Actually, Show and Read should be able to handle this. ... unfortunately, a quick test in GHCi revealed that "show" goes into an endless loop when fed with the above expression. Obviously, I'm expecting too much here. Maybe some of my Mozart experience is showing through here - the Mozart/Oz Inspector tool will show circular values with no problems, any references that form a cycle are printed as a marker. I.e. the above would look something like A=1 : A when printed, 'let x = 1 : 2 : x' would print as A=1 : 2 : A etc.

If it wasn't for mutable objects, foreign types and foreign functions, copying objects would be a straightforward but tedious issue.

Agreed.


Some of these issues can be avoided if objects are evaluated first because then the type can be used to determine whether mutable objects or foreign types are reachable.

Sometimes the evaluated value will be shorter than its unevaluated version, sometimes it's the reverse. Let the programmer decide what makes more sense. Since the programmer can always force evaluation as needed but cannot "unevaluate" something once it was executed, the system shouldn't force evaluation. Particularly since evaluation may not always terminate (transmitting a value doesn't mean it will be evaluated on the other side after all).

\ x -> lookup x table

As well as referring to 'table', this also refers to a standard library function. We can either copy 'lookup' over or we can use the
function of the same name on the receiver.

We can send a hash code of the function and have the receiver request the full function if it finds that the versions doesn't match. Actually, hash codes would be a better identification than qualified function names. Primitive functions would need name equality - there's no good way to compare them in any other useful way.

Copying the function over can get expensive because lots of Haskell libraries depend on other Haskell libraries and especially if we end
up copying it over multiple times.

A function need never be transferred more than once. (Keep them in a hash table with weak references, so that they can be garbage collected if they aren't in use anymore.)

Using the function with the same name runs the risk that the name is
 the same but the function itself is different.

Indeed.


Avoiding the overhead of sharing functions more than once is probably
easy enough. Create a signature for each function by hashing all the code in the function and in every function and object it depends on. Don't transmit functions which the receiver already has. If storing the function to a file, have the store operation omit functions listed in a table of signatures. e.g., you might provide signatures for all the standard libraries in ghc 6.02 on windows98. (Since most GHC operations are the same on all platforms and in all recent releases, this ought to match many functions in ghc 5.04.)


foo :: T -> T

If the receiving program contains a type 'T' and the sending program contains a type 'T', we might have to check that they are the same type so we have to choose an appropriate equality test for types. Is
structural equality enough or too strong or do we also want to
insist that any data hiding property the original program had is
preserved?

Types can be compared just like functions. I believe that the issues are just the same as with equality, but I may be wrong here.

A small corner of this question concerns portability of type representations between architectures with different word sizes and different endianness.

Representation is not a serious problem (though getting it under control can be some work).
What's more of a problem are implementation-dependent types, such as that integer type that's guaranteed to have at least such-and-so many binary digits. The shift functions assume a one's-complement representation of integers, which is the case on most modern processors but doesn't hold once things are ported to an MVS mainframe. (Oh, and Unicode function names are going to be very problematic if you want to compile them on a 7-bit ASCII or EBCDIC machine, but the only consequence that I can see for marshalling is that functions should never be identified by name during marshalling, they should be identified via hash code.)


Regards,
Jo

_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to