On 28-Mar-2000, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:
> Tue, 28 Mar 2000 20:37:24 +1000, Fergus Henderson <[EMAIL PROTECTED]> pisze:
> 
> > The ANSI/ISO C standard does not guarantee that `wchar_t' will be
> > Unicode, or even that it will be large enough to hold Unicode.
> > I believe that the Unicode consortium recommends against using
> > `wchar_t' for Unicode if you want portable code.
> 
> It also does not guarantee that char is ASCII, nevertheless Haskell
> assumes that it is.

Well, it is reasonable to make some assumptions which go
beyond what ANSI/ISO C guarantees.  I just want to be sure
that you understand what assumptions you are making, so
that if you do make such assumptions they are deliberate
trade-offs rather than accidental non-portabilities.

Having `char' be ASCII is very widespread, so assuming that
is probabably a good trade-off; I don't see much likelihood
of people wanting to port Haskell to EBCDIC environments.

Having `wchar_t' be Unicode is not so widespread;
indeed on some Unix systems what `wchar_t' represents
depends on the current locale.

> And allows int to be 128 bits, where Haskell does not have such type.

Haskell allows Int to be 128 bits.

But more importantly, the Haskell implementor who is implementing a
Haskell<->C FFI can presumably add additional types to their Haskell
implementation.  However, they won't have the same freedom to add
additional types to the C implementation.

> The Haskell FFI will not be portable to every
> ANSI C anyway.
> 
> We should probably explicitly specify additional assumptions about
> the C implementation.

Yes, that would be a very good idea.

> It's probably not very important to what type Char maps to. It maps
> to HsChar, which can be char or wchar_t or int or whatever, as they
> are mostly compatible in C. It's easier to explicitly specify fewer
> assumptions about exact correspondence of types, and provide conversion
> functions instead (for Haskell) and rely on implicit conversions
> between integral types (for C) - than to try to find whether Char
> should be mapped to char or wchar_t.

Leaving the exact representation of HsChar unspecified seems
quite reasonable to me.

Note that the exact representation of HsInt certainly needs to be unspecified.
So making HsChar's reprentation unspecified would be no great loss.

> OTOH CChar and CWChar could be any types which:
> - map to C types which have the same physical layout as char and
>   wchar_t respectively,
...
> So for example CChar could be Int8 and CWChar could be Int32,
...

> We could be more strict and make CChar and CWChar some more distinct
> types, not synonyms to other integral types, newtypes say, and
> guarantee that they map to char and wchar_t exactly.

I think that is a very good idea.

CChar and CWChar should be distinct types in Haskell, not type synonyms,
otherwise it would be easy to accidentally write non-portable code that makes
assumptions about how they are represented.

> It would not
> give much to guarantee exact correspondence on the C side. It would
> give only one thing if I understand it: foreign exported functions
> using CChar etc. will have proper C sygnatures in the *_stub.h file.
> Will have exactly predictable C types, and not some compatble types.
> Which makes sense if somebody wants to use C function pointers
> with them.

In the Mercury implementation, there are quite a lot of times where
we take the address of a function exported from Mercury to C.
I imagine the same might well be common for code interfacing Haskell
with C.  So I think it is worth getting that right.

Another case where getting the exact types might be important is
if you are interfacing Haskell code with C++.  In C++, you might
use overloading, and if the types generated by the Haskell FFI
vary depending on platform or the Haskell implementation, this
could cause your code to break when you try to port it.
This situation is probably a lot rarer, but it's another point
to consider.

> So the question is: must CChar, CInt, CLong etc. map (in foreign
> exported C type signatures) exactly to char, int, long etc., or it's
> enough that they point to some physically compatible types (which
> work even with mismatched function signatues)?

I recommend the former.

-- 
Fergus Henderson <[EMAIL PROTECTED]>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger [EMAIL PROTECTED]        |     -- the last words of T. S. Garp.

Reply via email to