Re: [Mono-list] Gcc summit...interesting stuff

2003-11-27 Thread Chris Lattner
> > Ah, ok.  I thought the unverifiable CIL was basically just machine code.
> > I didn't know it used the stack machine: cool!
>
> You have to distinguish between _unverifiable_ code, and _unmanaged_ code.
> The former uses the stack machine, the latter is just machine code.

Ah, ok, I see.

> > Ok.  There are _inherently_ difficult parts though.  For example, you
> > can't really translate '#ifdef BIG_ENDIAN' style code into a portable
> > representation, no matter what it is.
>
> That's true.  But code which uses #ifdef BIG_ENDIAN is not
> standard-conforming C code.

How is it not?  Though not the best designed, there is a ton of code that
contains their own byte swapping routines, which do different things on
hosts of different endianness.

> > The hardest part is probably handling all of the libc functions that
> > everyone expects: signals, stdio, etc.
>
> Right.  For most of that, you can implement it using PInvoke to invoke
> the underlying (run-time) platform's libc.  However, because there are
> a lot of macros that the C standard specifies are (compile-time) constant
> expressions, you would have to wrap a lot of the functionality.
> That is, you'd need to define your own set of C header files that define
> the constants in a platform-independent way, and then have the implementation
> of the C functions work by PInvoking your own C wrapper functions which
> convert these constants to the appropriate platform-specific values and
> then invoke the wrapped libc function.

Yup exactly.  The problem is doing it in such a way that running the code
managed gives you an advantage though: which means that it should
interoperate fairly well with the existing runtime and stuff.  *shrug*

Also, if you want the resulting CLI code to be portable to other systems,
then you will have to provide ALL of the header files.  On solaris, for
example, 'stdin' is a #define for __iob[0], which obviously doesn't work
too well if you run the binary on glibc.  :)

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


Re: [Mono-list] Gcc summit...interesting stuff

2003-11-26 Thread Fergus Henderson
On 26-Nov-2003, Chris Lattner <[EMAIL PROTECTED]> wrote:
> Fergus Henderson <[EMAIL PROTECTED]> wrote:
>> Chris Lattner <[EMAIL PROTECTED]> wrote:
> > > Ok.  There are _inherently_ difficult parts though.  For example, you
> > > can't really translate '#ifdef BIG_ENDIAN' style code into a portable
> > > representation, no matter what it is.
> >
> > That's true.  But code which uses #ifdef BIG_ENDIAN is not
> > standard-conforming C code.
> 
> How is it not?

It makes an assumption which is not guaranteed by the standard:
that the endianness will be fixed at compile time.

> Though not the best designed, there is a ton of code that
> contains their own byte swapping routines, which do different things on
> hosts of different endianness.

I agree that there is a ton of crap^H^H^H^Hnon-portable C code out there.
But I don't think we need to bend over backwards to support such code.
It's easy to write byte swapping routines without using #ifdef BIG_ENDIAN.
For example, here's an implementation of the BSD/Posix htons() and ntohs()
functions:

// convert host byte order to big-endian
uint16_t htons(uint16_t host_int) {
uint16_t big_endian_int;
uint8_t *p = (uint8_t *)&big_endian_int;
p[0] = (host_int & 0xff00) >> 8;
p[1] = host_int & 0xff;
return big_endian_int;
}

// convert big-endian to host byte order
uint16_t ntohs(uint16_t big_endian_int) {
uint8_t *p = (uint8_t *)&big_endian_int;
return (p[0] << 8) + p[1];
}

Mind you, IMHO it's better to just use byte arrays rather than integer
types once you have converted to non-host byte order.  Then you can use
uint_least8_t and uint_least16_t rather than relying on the existence of
uint8_t and uint16_t:

typedef struct { uint_least8_t bytes[2]; } network_uint16;

network_uint16 my_htons(uint_least16_t host_int) {
network_uint16_t big_endian_int;
big_endian_int.bytes[0] = (host_int & 0xff00) >> 8;
big_endian_int.bytes[1] = host_int & 0x00ff;
return big_endian_int;
}

uint_least16_t my_ntohs(network_uint16_t big_endian_int) {
return (big_endian_int.bytes[0] << 8) + big_endian_int.bytes[1];
}

> The problem is doing it in such a way that running the code
> managed gives you an advantage though: which means that it should
> interoperate fairly well with the existing runtime and stuff.  *shrug*

Well, there's already an advantage even if you can't easily interoperate
with other CLI code: the generated binaries can run on any architecture.
You don't need to recompile for each different OS or architecture.

> Also, if you want the resulting CLI code to be portable to other systems,
> then you will have to provide ALL of the header files.

Certainly.

-- 
Fergus Henderson <[EMAIL PROTECTED]>  |  "I have always known that the pursuit
The University of Melbourne |  of excellence is a lethal habit"
WWW:   | -- the last words of T. S. Garp.
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


Re: [Mono-list] Gcc summit...interesting stuff

2003-11-26 Thread Fergus Henderson
On 26-Nov-2003, Chris Lattner <[EMAIL PROTECTED]> wrote:
> > > While possible, it would be _very_ difficult.  LLVM code is more
> > > expressive/low-level than CIL code: for example array bounds checks are
> > > not implicit, and there is no object model.  I'm not sure exactly how you
> > > would map general C programs onto the managed runtime at least, much less
> > > general LLVM programs.
> >
> > LLVM should map to *unverifiable* CIL without too much difficulty, I think.
> > Well, actually you'd map to a subset of that: you wouldn't use the
> > object model instructions at all.
> 
> Ah, ok.  I thought the unverifiable CIL was basically just machine code.
> I didn't know it used the stack machine: cool!

You have to distinguish between _unverifiable_ code, and _unmanaged_ code.
The former uses the stack machine, the latter is just machine code.

> > It's mostly fairly straight-forward to map general C programs onto
> > unverifiable CIL.  Casting a pointer to int or vice versa is easy, just
> > push as one type and pop as another.  Pointer arithmetic is just integer
> > addition.  The C heap is unmanaged memory which can be allocated either
> > as a global array or using OS-specific code.
> 
> Ok.  There are _inherently_ difficult parts though.  For example, you
> can't really translate '#ifdef BIG_ENDIAN' style code into a portable
> representation, no matter what it is.

That's true.  But code which uses #ifdef BIG_ENDIAN is not
standard-conforming C code.

> The hardest part is probably handling all of the libc functions that
> everyone expects: signals, stdio, etc.

Right.  For most of that, you can implement it using PInvoke to invoke
the underlying (run-time) platform's libc.  However, because there are
a lot of macros that the C standard specifies are (compile-time) constant
expressions, you would have to wrap a lot of the functionality.
That is, you'd need to define your own set of C header files that define
the constants in a platform-independent way, and then have the implementation
of the C functions work by PInvoking your own C wrapper functions which
convert these constants to the appropriate platform-specific values and
then invoke the wrapped libc function.

-- 
Fergus Henderson <[EMAIL PROTECTED]>  |  "I have always known that the pursuit
The University of Melbourne |  of excellence is a lethal habit"
WWW:   | -- the last words of T. S. Garp.
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


Re: [Mono-list] Gcc summit...interesting stuff

2003-11-26 Thread Chris Lattner
> > While possible, it would be _very_ difficult.  LLVM code is more
> > expressive/low-level than CIL code: for example array bounds checks are
> > not implicit, and there is no object model.  I'm not sure exactly how you
> > would map general C programs onto the managed runtime at least, much less
> > general LLVM programs.
>
> LLVM should map to *unverifiable* CIL without too much difficulty, I think.
> Well, actually you'd map to a subset of that: you wouldn't use the
> object model instructions at all.

Ah, ok.  I thought the unverifiable CIL was basically just machine code.
I didn't know it used the stack machine: cool!

> It's mostly fairly straight-forward to map general C programs onto
> unverifiable CIL.  Casting a pointer to int or vice versa is easy, just
> push as one type and pop as another.  Pointer arithmetic is just integer
> addition.  The C heap is unmanaged memory which can be allocated either
> as a global array or using OS-specific code.

Ok.  There are _inherently_ difficult parts though.  For example, you
can't really translate '#ifdef BIG_ENDIAN' style code into a portable
representation, no matter what it is.

> There are some tricky parts,
> such as volatile and setjmp/longjmp, but these are not insurmountable
> hurdles -- they can be handled, it just requires a little more cleverness.

The hardest part is probably handling all of the libc functions that
everyone expects: signals, stdio, etc.  Running a subset of C programs
probably wouldn't be that hard.  Also, it's not volatile itself that is
the problem: it's the reasons that volatile exists which you probably
wouldn't be able to support (mmap'd IO, etc).

Also, you might be interested to know that LLVM already maps
setjmp/longjmp into exception handling constructs, so I expect sjlj to not
be a big problem in a CIL mapping...

> > The best way to do this would be to make a _new_ C/C++ compiler like
> > Microsoft did, which adds language restrictions for managed mode.
>
> Microsoft's C++ compiler, and lcc, and the C compiler in Portable.NET
> can all compile almost every C construct to unverifiable IL.  I don't
> know if any of them handle volatile properly, and AFAIK none of them
> handle setjmp/longjmp.  But that's just lack of development resources.

LLVM preserves volatile correctly and maps SJLJ into exceptions.  It also
supports the full set of GCC extensions, and uses the G++ "3.4" parser.
If anyone would like to try out LLVM, please download it (or you can use
the webpage: http://llvm.cs.uiuc.edu/demo ).  Of course, I would be happy
to answer any questions...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


Re: [Mono-list] Gcc summit...interesting stuff

2003-11-26 Thread Fergus Henderson
On 25-Nov-2003, Chris Lattner <[EMAIL PROTECTED]> wrote:
> On Tue, 25 Nov 2003 [EMAIL PROTECTED] wrote:
> > Well, I guess the most important question would be: How hard could it be
> > to make it [LLVM] target IL code?
> 
> Here's the response I sent to Miguel:
> 
> While possible, it would be _very_ difficult.  LLVM code is more
> expressive/low-level than CIL code: for example array bounds checks are
> not implicit, and there is no object model.  I'm not sure exactly how you
> would map general C programs onto the managed runtime at least, much less
> general LLVM programs.

LLVM should map to *unverifiable* CIL without too much difficulty, I think.
Well, actually you'd map to a subset of that: you wouldn't use the
object model instructions at all.

It's mostly fairly straight-forward to map general C programs onto
unverifiable CIL.  Casting a pointer to int or vice versa is easy, just
push as one type and pop as another.  Pointer arithmetic is just integer
addition.  The C heap is unmanaged memory which can be allocated either
as a global array or using OS-specific code.  There are some tricky parts,
such as volatile and setjmp/longjmp, but these are not insurmountable
hurdles -- they can be handled, it just requires a little more cleverness.

> The best way to do this would be to make a _new_ C/C++ compiler like
> Microsoft did, which adds language restrictions for managed mode.

Microsoft's C++ compiler, and lcc, and the C compiler in Portable.NET
can all compile almost every C construct to unverifiable IL.  I don't
know if any of them handle volatile properly, and AFAIK none of them
handle setjmp/longjmp.  But that's just lack of development resources.

-- 
Fergus Henderson <[EMAIL PROTECTED]>  |  "I have always known that the pursuit
The University of Melbourne |  of excellence is a lethal habit"
WWW:   | -- the last words of T. S. Garp.
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


RE: [Mono-list] Gcc summit...interesting stuff

2003-11-26 Thread Chris Lattner
On Tue, 25 Nov 2003 [EMAIL PROTECTED] wrote:

>
> > If you have any questions, please feel free to contact me, though response
> > might be slow due to the holidays.
> >
> > -Chris
>
> Well, I guess the most important question would be: How hard could it be
> to make it target IL code?

Here's the response I sent to Miguel:

While possible, it would be _very_ difficult.  LLVM code is more
expressive/low-level than CIL code: for example array bounds checks are
not implicit, and there is no object model.  I'm not sure exactly how you
would map general C programs onto the managed runtime at least, much less
general LLVM programs.  The best way to do this would be to make a _new_
C/C++ compiler like Microsoft did, which adds language restrictions for
managed mode.

On the other hand, we've been talking about implementing a CIL front-end
for LLVM, which is possible because LLVM is lower-level than CIL.  This
would also have the advantage that all of the IPO and other features of
LLVM could be directly applied to CIL code.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


RE: [Mono-list] Gcc summit...interesting stuff

2003-11-25 Thread pbaena

> If you have any questions, please feel free to contact me, though response
> might be slow due to the holidays.
> 
> -Chris

Well, I guess the most important question would be: How hard could it be to make it 
target IL code?

Thanks,
Pablo
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


RE: [Mono-list] Gcc summit...interesting stuff

2003-11-25 Thread Chris Lattner

Pablo Baena wrote:
> Have you seen this paper
> http://www.linux.org.uk/~ajh/gcc/gccsummit-2003-proceedings.pdf where
> someone implemented a low level virtual machine http://llvm.cs.uiuc.edu/
> for gcc?? Doesn't it give you ideas? *wink* *wink* Interesting. I don't
> know if it could be extended to support g++.

Absolutely.  In fact, we already support g++.  Work is underway on objc,
Caml, Java, a forth-like frontend, etc.  The open-source 1.0 release of
LLVM is available btw:
http://mail.cs.uiuc.edu/pipermail/llvm-announce/2003-October/02.html

We are also tentatively planning for a 1.1 release in the next couple of
weeks.  Here's the most recent "status update" from the project:
http://mail.cs.uiuc.edu/pipermail/llvm-announce/2003-November/03.html

If you have any questions, please feel free to contact me, though response
might be slow due to the holidays.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


[Mono-list] Gcc summit...interesting stuff

2003-06-19 Thread Pablo Baena
Have you seen this paper
http://www.linux.org.uk/~ajh/gcc/gccsummit-2003-proceedings.pdf where
someone implemented a low level virtual machine 
http://llvm.cs.uiuc.edu/ for gcc?? Doesn't it give you ideas? *wink*
*wink*

Interesting. I don't know if it could be extended to support g++.

-- 
Now it is human nature that however a human being is,
he is inclined to think that is the right way to be.
Hans Reiser

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list