Re: Staying alive
Michael Weber wrote: [...] otherwise there would be a `marshalList' and `marshalListN' with same type signature, which I also regarded as confusing. That was a problem with your approach, not mine... :-) [...] But how else can you marshall an arbitrary length list? For Int lists, there is no zeroElem... [...] A usual approach for this in many C-based APIs is that two arguments are passed: a pointer and an integral of some kind, stating the number of elements pointed to, see e.g. conformant arrays and the size_of attribute in DCE-IDL (can't remember how MIDL calls this). Cheers, Sven -- Sven PanneTel.: +49/89/2178-2235 LMU, Institut fuer Informatik FAX : +49/89/2178-2211 LFE Programmier- und Modellierungssprachen Oettingenstr. 67 mailto:[EMAIL PROTECTED]D-80538 Muenchen http://www.informatik.uni-muenchen.de/~Sven.Panne
Re: Staying alive
On Wed, Oct 06, 1999 at 12:04:03 +0200, Sven Panne wrote: Michael Weber wrote: marshalListN :: Int - [a] - IO Addr I don't see a reason for the duplication of length information, e.g. what should `marshalListN 1000 [1,2,3]' mean? And `marshalListN 2 [1,2,3]' is equivalent to `marshalList (take 2 [1,2,3])'. Yes... I thought about this, but otherwise there would be a `marshalList' and `marshalListN' with same type signature, which I also regarded as confusing. where `marshalList' stores the list length at the beginning of the buffer, like: [ marshalList code deleted ] This seems to be a rather special case, e.g. why should the length be marshaled as an Int? Some ancient Pascals on PCs used a byte length prefix for strings. And what about alignment? There could/must be some padding if the alignment constraints for the list element type are stronger than the ones for Int. Yes, I know. I cursed on this stupid Pascal string limit in school :-) But how else can you marshall an arbitrary length list? For Int lists, there is no zeroElem... BTW: This is what is done in XDR (eXternal Data Representation standard, see RFC1832). The length is at the beginning, coded as big-endian 32Bit unsigned int (we could use `writeInt32OffAddr . ccall_to_htonl' for this...) It seems that both spellings are possible and both "feel right", depending on the speaker and the dictionary. :-) That's what I expected... :-) Cheers, Michael
Re: Staying alive
Michael Weber wrote: Maybe it's worth to know how many objects a list has (without explicit zeroElem), so I'd suggest to extend the class to: marshalList ::[a] - IO Addr We agree here. marshalListN :: Int - [a] - IO Addr I don't see a reason for the duplication of length information, e.g. what should `marshalListN 1000 [1,2,3]' mean? And `marshalListN 2 [1,2,3]' is equivalent to `marshalList (take 2 [1,2,3])'. unmarshalList ::Addr - IO [a] How many elements are there starting at the given Addr? Hmmm unmarshalListN:: Int - Addr - IO [a] This is my definition of unmarshalListN. where `marshalList' stores the list length at the beginning of the buffer, like: [ marshalList code deleted ] This seems to be a rather special case, e.g. why should the length be marshaled as an Int? Some ancient Pascals on PCs used a byte length prefix for strings. And what about alignment? There could/must be some padding if the alignment constraints for the list element type are stronger than the ones for Int. Therefore this function does not belong to a general purpose library IMHO. I'll post a new proposal for a "better" Addr module later... Cheers, Sven P.S.: I've received a few personal mails regarding the spelling of marshaling, but there is no real agreement on this. It seems that both spellings are possible and both "feel right", depending on the speaker and the dictionary. :-) -- Sven PanneTel.: +49/89/2178-2235 LMU, Institut fuer Informatik FAX : +49/89/2178-2211 LFE Programmier- und Modellierungssprachen Oettingenstr. 67 mailto:[EMAIL PROTECTED]D-80538 Muenchen http://www.informatik.uni-muenchen.de/~Sven.Panne
Re: Staying alive
[My apologies: some day - I promise - I'll learn to choose the right mailing list... 8-] On Mon, Oct 04, 1999 at 13:45:43 +0200, Sven Panne wrote: [micro-marshalling] I know Manuel's code already and the one H/Direct produces. Has anybody else some FFI-related code and/or suggestions? This could be a wonderful topic for the wish list, but it would be nice to see the problems and needs for APIs different from GTK+ and OpenGL first. Maybe it's worth to know how many objects a list has (without explicit zeroElem), so I'd suggest to extend the class to: marshalList ::[a] - IO Addr marshalListN :: Int - [a] - IO Addr unmarshalList ::Addr - IO [a] unmarshalListN:: Int - Addr - IO [a] where `marshalList' stores the list length at the beginning of the buffer, like: marshalList xs = do let len = length xs lenSize = sizeOf len buf - malloc (lenSize + len * sizeOf (head xs)) writeIntOffAddr buf 0 len zipWithM_ (writeOffAddr (buf `plusAddr` lenSize)) [ 0 .. ] xs return buf and `instance Marshalable Int {...}' P.S. for the native speakers: Which spelling is correct, "marshaling" or "marshalling"? Ispell says "marshaling", but this looks a bit odd to me. dictd knows both: -- From The Free On-line Dictionary of Computing (20Jul99) [foldoc]: marshalling communications (US -ll- or -l-) The process of packing one or more items of data into a {message buffer}, prior to transmitting that message buffer over a communication channel. The packing process not only collects together values which may be stored in non-consecutive memory locations but also converts data of different types into a standard representation agreed with the recipient of the message. marshaling spelling Alternative US spelling of "{marshalling}". (1998-03-16) -- I'd vote for the "double-L" variant... :-) Cheers, Michael -- "Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." -- Howard Aiken
Re: Staying alive
"Manuel M. T. Chakravarty" wrote: I wouldn't call autoconf a hack :-) OK, MEGA-hack. :-) But it's m4 which is the real culprit... [...] Anyway, such a class seems to be a nice way for expressing the information about marshalable data types. The class is actually more about "micro"-marshaling, e.g. there are no methods which actually do some form of memory management. In my current OpenGL-binding I actually use the following larger class definition: - class Eq a = Marshalable a where sizeOf:: a - Int zeroElem :: a indexOffAddr :: Addr - Int -a readOffAddr :: Addr - Int - IO a writeOffAddr :: Addr - Int - a - IO () marshal :: a - IO Addr marshalList :: [a] - IO Addr marshalListZero :: [a] - IO Addr unmarshal ::Addr - IO a unmarshalList :: Int - Addr - IO [a] unmarshalListZero ::Addr - IO [a] marshal x = marshalList [x] marshalList xs = do buf - malloc (length xs * sizeOf (head xs)) zipWithM_ (writeOffAddr buf) [ 0 .. ] xs return buf marshalListZero xs = do let numElements = length xs buf - malloc ((numElements+1) * sizeOf (head xs)) zipWithM_ (writeOffAddr buf) [ 0 .. ] xs writeOffAddr buf numElements (zeroElem `asTypeOf` head xs) return buf unmarshal buf = liftM head $ unmarshalList 1 buf unmarshalList numElements buf = mapM (readOffAddr buf) [ 0 .. numElements-1 ] unmarshalListZero buf = loop 0 [] where loop idx accu = do x - readOffAddr buf idx if x == zeroElem then return $ reverse accu else loop (idx+1) (x:accu) - This captures usual marshaling/unmarshaling patterns for * a single value, * a list of values, * and a list of values, terminated by a special value (zeroElem). Two typical instances are: - instance Marshalable Char where sizeOf = const @ac_cv_sizeof_char@ zeroElem = '\0' indexOffAddr = indexCharOffAddr readOffAddr = readCharOffAddr writeOffAddr = writeCharOffAddr instance Marshalable Addr where sizeOf = const @ac_cv_sizeof_void_p@ zeroElem = nullAddr indexOffAddr = indexAddrOffAddr readOffAddr = readAddrOffAddr writeOffAddr = writeAddrOffAddr - The @ac_cv_sizeof_...@ are substituted by the configure script. Given the above class definition one can write convenient higher order functions, e.g. for in/inout parameter passing (which actually involve some kind of alloca): - inParamWith :: (a - IO Addr) - (Addr - IO b) - a - IO b inParamWith marsh act x = do buf - marsh x ret - act buf free buf return ret inParam :: Marshalable a = (Addr - IO b) - a - IO b inParam = inParamWith marshal inOutParamWith :: (a - IO Addr) - (Addr - IO a) - (Addr - IO ()) - a - IO a inOutParamWith marsh unmarsh act x = do buf - marsh x act buf ret - unmarsh buf free buf return ret inOutParam :: Marshalable a = (Addr - IO ()) - a - IO a inOutParam = inOutParamWith marshal unmarshal - I know Manuel's code already and the one H/Direct produces. Has anybody else some FFI-related code and/or suggestions? This could be a wonderful topic for the wish list, but it would be nice to see the problems and needs for APIs different from GTK+ and OpenGL first. Cheers, Sven P.S. for the native speakers: Which spelling is correct, "marshaling" or "marshalling"? Ispell says "marshaling", but this looks a bit odd to me. -- Sven PanneTel.: +49/89/2178-2235 LMU, Institut fuer Informatik FAX : +49/89/2178-2211 LFE Programmier- und Modellierungssprachen Oettingenstr. 67 mailto:[EMAIL PROTECTED]D-80538 Muenchen http://www.informatik.uni-muenchen.de/~Sven.Panne
Re: Staying alive
Sven Panne [EMAIL PROTECTED] wrote, Simon Marlow wrote: Manuel M. T. Chakravarty wrote: [...] we should really enforce a stack discipline and use a wrapper function: alloca :: Int - (Addr - IO a) - IO a Nice idea. I'm not sure about the implementation though: did you have in mind using C's malloc/free? Perhaps I'm totally missing the point here, but alloca seems to be dead easy to me: [delete version using malloc/free] It's getting tough if you want to avoid malloc/free though. Which is exactly the reason, I didn't mention malloc and free. First of all, my thinking is more along the lines of how can we make this already promising FFI design into something that can eventually get into the Haskell standard. So, all GHC-specific hacks don't help us anything. Second, we need a design that is sufficiently straight-forward to implement and still allows an efficient implementation of the FFI across different systems. So, the question is, given the general problem of having to deal with (in)out parameters in C, how can we handle them nicely in the FFI? The byte array stuff doesn't cut it for reasons already discussed. The idea behind the `alloca' proposal is that it (1) allows the compiler (or interpreter) to allocate a chunk of stable memory in whatever way is most efficient in that implementation and (2) it makes it clear when the memory area can be freed again. Any better ideas? But IMHO the main problem is the handling of the correct offsets for e.g. inout parameters in the buffer. Alloca doesn't help here, and if you do it by hand, you end up with Haskell code which look suspiciously like the one H/Direct generates... Hmmm, that's truly a problem if coding on the FFI manually. So, my thinking was too tool-centric. So, another problem is that we want to allow tools to use their knowlege of the size of primitives datatypes etc to optimise the marshaling process, but still, we want to allow a sufficiently convenient manual use of the FFI. So, how about providing functions that advance the `mem' pointer for you or even have `writeXXXOffAddr' return the next address after the datum just written? I.e, something like foo = alloca N $ \mem1 - do mem1 - writeWord32OffAddr mem1 0 ... mem2 - writeInt16OffAddr mem2 0 ... ... bar mem1 mem2 optionally take stuff out of `memX' again foreign import ... bar :: Addr - Addr ... - IO () This is, however, maybe not so helpful because of alignment constraints that are harder to satisfy manually than in generated code. So how about foo = alloca N $ \mem1 - do writeWord16OffAddr mem 0 ... let mem2 = mem `plusAddr` 4 mem2 - writeInt32OffAddr mem2 0 ... ... bar mem1 mem2 optionally take stuff out of `memX' again foreign import ... bar :: Addr - Addr ... - IO () A tool can pack everything as dense as the architecture- dependent alignment constraints permit and a human can play save and, e.g., use multiples of 4 on a 32-bit architecture (the exact number should of course be hidden in a config file to allow easy migration to other architectures). I think, this is not more difficult than having some form of `malloc' call for every in(out) parameter passed (which is how I do it currently in Gtk+HS). BTW, I recently noticed that `writeInt32OffAddr a k' and friends access `a + k * 4' and not `a + k', as I always thought. I think that this is rather unfortunate, as for marshaling, I usually want to give a byte offset and not multiples of the value currently accessed (as the marshaled data is usually not uniform). Furthermore, I can always get the current behviour by explicitly multiplying the index with the number of 4 byte elements, but it is not possible the other way around. Finally, it would be nice to have floating point numbers with explicit sizes - in the above `alloca' example, how would a human programmer know, how much memory to allocate for a float? I guess, they are not provided, because it would be a pain to implement the float operations for sizes that are not `natively' supported on the platform. However, the current situation, where you have to hope that `Float' in Haskell and `float' in C match is also not very nice and definitely not appropriate for standardisation - unless Haskell guarantees that `Float' and `float' as well as `Double' and `double' always use the same representation (and `long double' is completely missing so far). So, what is less pain? Guarantee the use of the standard float sizes in Haskell or `Float's of explicit size? Any other ideas? In summary, WishList requests: * `alloca' or something better :-) * writeXXXOffAddr indexed with bytes offsets * floats with explicit sizes or a any other form of guarantee about their size support for `long double' Cheers, Manuel
Staying alive
In a recent discussion with Manuel Chakravarty the following question arose. Given the following code: foo = do ba - newMutableByteArray ... ... bar ba -- ba not used here anymore foreign import ... bar :: MutableByteArray ... - IO () Let's further assume that bar makes callbacks to Haskell-land where a GC occurs. Now the subtle question: Is it *guaranteed* that ba is considered alive until bar returns or not? Either way, this should be clarified in the FFI documentation. Background: A lot of APIs (e.g. GTK+ and OpenGL :-) have functions expecting a pointer to some data. One possible way to use them from Haskell is * malloc area * fill area with data * call API function with pointer to area * explicitly free malloced area Using (Mutable)ByteArrays would simplify thing here, especially the last step would not be needed. Cheers, Sven -- Sven PanneTel.: +49/89/2178-2235 LMU, Institut fuer Informatik FAX : +49/89/2178-2211 LFE Programmier- und Modellierungssprachen Oettingenstr. 67 mailto:[EMAIL PROTECTED]D-80538 Muenchen http://www.informatik.uni-muenchen.de/~Sven.Panne