Re: Labels as values and threaded-code interpretation

2013-06-03 Thread Jacob Carlborg

On 2013-06-02 19:44, Walter Bright wrote:


The curious question is why it never gets into the newer C and C++
Standards.


It's like inline assembly. I think basically every compiler supports it 
but it's not in the standard. At least there's a compiler for each 
platform supporting it.


--
/Jacob Carlborg


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Paulo Pinto

Am 02.06.2013 06:49, schrieb Walter Bright:

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and
essentialness) of
this feature. I'm very close to just writing most of the interpreter
in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add it
as an extension.


I always have fun in public forums making people aware that what they 
think is C or C++ code is actually compiler defined behaviour.


Not much people seem to care to read language standards. :)

--
Paulo



Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Brad Roberts

On 6/1/13 9:49 PM, Walter Bright wrote:

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and essentialness) of
this feature. I'm very close to just writing most of the interpreter in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add it as an 
extension.

Also, such a construct could not be made @safe. The trouble is you could pass 
those addresses
anywhere, and goto them from anywhere.


While you're technically correct, every major compiler in the unix world support it with the same 
syntax.  It's entered into defacto standard status.


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Paulo Pinto

Am 02.06.2013 09:25, schrieb Brad Roberts:

On 6/1/13 9:49 PM, Walter Bright wrote:

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and
essentialness) of
this feature. I'm very close to just writing most of the interpreter
in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add
it as an extension.

Also, such a construct could not be made @safe. The trouble is you
could pass those addresses
anywhere, and goto them from anywhere.


While you're technically correct, every major compiler in the unix world
support it with the same syntax.  It's entered into defacto standard
status.


If your world is only UNIX then yeah, there are still lots of non UNIX 
os out there if you look outside the desktop computers.


Of course, one could always question if they matter at all.


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Brad Roberts

On 6/2/13 12:33 AM, Paulo Pinto wrote:

Am 02.06.2013 09:25, schrieb Brad Roberts:

On 6/1/13 9:49 PM, Walter Bright wrote:

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and
essentialness) of
this feature. I'm very close to just writing most of the interpreter
in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add
it as an extension.

Also, such a construct could not be made @safe. The trouble is you
could pass those addresses
anywhere, and goto them from anywhere.


While you're technically correct, every major compiler in the unix world
support it with the same syntax.  It's entered into defacto standard
status.


If your world is only UNIX then yeah, there are still lots of non UNIX os out 
there if you look
outside the desktop computers.

Of course, one could always question if they matter at all.


I agree.  I said unix primarily to mean most except for msvc.

Many if not most of those non-unix platforms use gcc, which is included in the set of compilers that 
does support it.  The point is, which you didn't change, is that's it's a defacto standard even if 
not technically in the c or c++ language specs.





Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Dmitry Olshansky

01-Jun-2013 20:13, Timon Gehr пишет:

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to bring it
up again (because I'm going to be writing a threaded-code interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's essential to
write a threaded-code interpreter.



I'd also like to see this.



Same here.

Though I believe a way to force tail-call can support the same use case 
also helping functional programming.


Say:

goto Call-Expression; //forced tail call

instead of:

return Call-Expression;

--
Dmitry Olshansky


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread nazriel
On Saturday, 1 June 2013 at 05:29:28 UTC, Alex Rønne Petersen 
wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to 
bring it up again (because I'm going to be writing a 
threaded-code interpreter): 
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html


This is an incredibly important extension. The final switch 
statement is not a replacement because it doesn't allow the 
programmer to store a label address directly into a code 
stream, which is what's essential to write a threaded-code 
interpreter.


The Erlang folks went through hell just to use this feature; 
see the 5th Q at: 
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions




It would also solve the problem with IASM and being unable to 
address the labels in iasm


I noticed the problem when I tried to address the db'ed bytes.
http://dpaste.dzfl.pl/36bbd7d3

Commented out //mov dword ptr RAX, meh; ??? illustrates the 
problem.


Jumping to labels in IASM blocks seems to work ( 
http://dpaste.dzfl.pl/d9e47f70 ).


I guess we could have two chickens backed on the same fire ;)


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Manu
On 2 June 2013 17:25, Brad Roberts bra...@puremagic.com wrote:

 On 6/1/13 9:49 PM, Walter Bright wrote:

 On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

 On 01-06-2013 09:59, bearophile wrote:

 Recently the Python C interpreter was modified and speed up thanks to
 this non-standard feature. CPython source code has two versions, one
 with computed gotos and one without, to compile it even if your C
 compiler doesn't support them or their GNU-C syntax.


 I don't think there's any question as to the usefulness (and
 essentialness) of
 this feature. I'm very close to just writing most of the interpreter in
 C over a
 triviality like this.


 To be pedantic, C and C++ don't have that feature. Some compilers add it
 as an extension.

 Also, such a construct could not be made @safe. The trouble is you could
 pass those addresses
 anywhere, and goto them from anywhere.


 While you're technically correct, every major compiler in the unix world
 support it with the same syntax.  It's entered into defacto standard status.


MSVC doesn't support it, which is really annoying.


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Alex Rønne Petersen

On 02-06-2013 08:44, Paulo Pinto wrote:

Am 02.06.2013 06:49, schrieb Walter Bright:

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and
essentialness) of
this feature. I'm very close to just writing most of the interpreter
in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add it
as an extension.


I always have fun in public forums making people aware that what they
think is C or C++ code is actually compiler defined behaviour.

Not much people seem to care to read language standards. :)

--
Paulo



I am perfectly aware that it's a GNU C extension.

--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Alex Rønne Petersen

On 02-06-2013 10:52, Dmitry Olshansky wrote:

01-Jun-2013 20:13, Timon Gehr пишет:

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to bring it
up again (because I'm going to be writing a threaded-code interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's essential to
write a threaded-code interpreter.



I'd also like to see this.



Same here.

Though I believe a way to force tail-call can support the same use case
also helping functional programming.

Say:

goto Call-Expression; //forced tail call

instead of:

return Call-Expression;



I'm not sure that can support threaded-code interpretation.

--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Dmitry Olshansky

02-Jun-2013 20:48, Alex Rønne Petersen пишет:

On 02-06-2013 10:52, Dmitry Olshansky wrote:

01-Jun-2013 20:13, Timon Gehr пишет:

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to bring it
up again (because I'm going to be writing a threaded-code interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch
statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's essential to
write a threaded-code interpreter.



I'd also like to see this.



Same here.

Though I believe a way to force tail-call can support the same use case
also helping functional programming.

Say:

goto Call-Expression; //forced tail call

instead of:

return Call-Expression;



I'm not sure that can support threaded-code interpretation.


Why not:

alias OpCode = function void();

OpCode[] opcodes = [ opcode_1, ... ];

int pc;
...

void opcode_1()
{
... //pick operands do whatever
pc = pc + num_operands; //skip over arguments
OpCode next = cast(OpCode)bytecode[pc];
goto next(); //this is baked-in threaded dispatch
}

void opcode_2(){ ... }

//say bytecode contains operands and addresses of fucntions
void execute(size_t[] bytecode, int pc)
{
OpCode start  = cast(OpCode)bytecode[pc];
pc++;
goto start();
}

One can get away without casting if data is in a separate array.
Then this solution is perfectly safe in this limited form. Call would 
only point to a function hence no problem with jumping who knows where 
and less problems for optimizer.


--
Dmitry Olshansky


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Walter Bright

On 6/2/2013 12:49 AM, Brad Roberts wrote:

Many if not most of those non-unix platforms use gcc, which is included in the
set of compilers that does support it.  The point is, which you didn't change,
is that's it's a defacto standard even if not technically in the c or c++
language specs.


The curious question is why it never gets into the newer C and C++ Standards.



Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Alex Rønne Petersen

On 02-06-2013 19:43, Dmitry Olshansky wrote:

02-Jun-2013 20:48, Alex Rønne Petersen пишет:

On 02-06-2013 10:52, Dmitry Olshansky wrote:

01-Jun-2013 20:13, Timon Gehr пишет:

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to
bring it
up again (because I'm going to be writing a threaded-code
interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch
statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's
essential to
write a threaded-code interpreter.



I'd also like to see this.



Same here.

Though I believe a way to force tail-call can support the same use case
also helping functional programming.

Say:

goto Call-Expression; //forced tail call

instead of:

return Call-Expression;



I'm not sure that can support threaded-code interpretation.


Why not:

alias OpCode = function void();

OpCode[] opcodes = [ opcode_1, ... ];

int pc;
...

void opcode_1()
{
 ... //pick operands do whatever
 pc = pc + num_operands; //skip over arguments
 OpCode next = cast(OpCode)bytecode[pc];
 goto next(); //this is baked-in threaded dispatch
}

void opcode_2(){ ... }

//say bytecode contains operands and addresses of fucntions
void execute(size_t[] bytecode, int pc)
{
 OpCode start  = cast(OpCode)bytecode[pc];
 pc++;
 goto start();
}

One can get away without casting if data is in a separate array.
Then this solution is perfectly safe in this limited form. Call would
only point to a function hence no problem with jumping who knows where
and less problems for optimizer.



The problem here is assuming the interpreter state can be global. Once 
you make it non-global (and thus have to pass it in the goto start(...) 
call) you get all the overhead of a regular function call.


--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Alex Rønne Petersen

On 02-06-2013 19:44, Walter Bright wrote:

On 6/2/2013 12:49 AM, Brad Roberts wrote:

Many if not most of those non-unix platforms use gcc, which is
included in the
set of compilers that does support it.  The point is, which you didn't
change,
is that's it's a defacto standard even if not technically in the c or c++
language specs.


The curious question is why it never gets into the newer C and C++
Standards.



That one's easy: It makes 'too many' assumptions about the capabilities 
of the target machine. Sort of like assuming signed integers overflow.


In practice, though, any relevant target can support computed gotos.

--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Brad Roberts

On 6/2/13 10:44 AM, Walter Bright wrote:

On 6/2/2013 12:49 AM, Brad Roberts wrote:

Many if not most of those non-unix platforms use gcc, which is included in the
set of compilers that does support it.  The point is, which you didn't change,
is that's it's a defacto standard even if not technically in the c or c++
language specs.


The curious question is why it never gets into the newer C and C++ Standards.


I can only surmise since I've never been a part of those processes, but:

1) since the support is already there for the people that want it, why go 
through the hassle?

2) going through the hassle just to get it more broadly supported is a lot of 
hassle.

3) people are generally fundamentally allergic to hassle.

Which also lines up with why people are asking for it to be added to d, because it's not already 
available, so it's worth the hassle.


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Dmitry Olshansky

02-Jun-2013 21:49, Alex Rønne Petersen пишет:

On 02-06-2013 19:43, Dmitry Olshansky wrote:

02-Jun-2013 20:48, Alex Rønne Petersen пишет:

On 02-06-2013 10:52, Dmitry Olshansky wrote:

01-Jun-2013 20:13, Timon Gehr пишет:

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to
bring it
up again (because I'm going to be writing a threaded-code
interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch
statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's
essential to
write a threaded-code interpreter.



I'd also like to see this.



Same here.

Though I believe a way to force tail-call can support the same use case
also helping functional programming.

Say:

goto Call-Expression; //forced tail call

instead of:

return Call-Expression;



I'm not sure that can support threaded-code interpretation.


Why not:

alias OpCode = function void();

OpCode[] opcodes = [ opcode_1, ... ];

int pc;
...

void opcode_1()
{
 ... //pick operands do whatever
 pc = pc + num_operands; //skip over arguments
 OpCode next = cast(OpCode)bytecode[pc];
 goto next(); //this is baked-in threaded dispatch
}

void opcode_2(){ ... }

//say bytecode contains operands and addresses of fucntions
void execute(size_t[] bytecode, int pc)
{
 OpCode start  = cast(OpCode)bytecode[pc];
 pc++;
 goto start();
}

One can get away without casting if data is in a separate array.
Then this solution is perfectly safe in this limited form. Call would
only point to a function hence no problem with jumping who knows where
and less problems for optimizer.



The problem here is assuming the interpreter state can be global. Once
you make it non-global (and thus have to pass it in the goto start(...)
call) you get all the overhead of a regular function call.



One pointer to a struct MyInterpreterState. Think of it as 'this' 
pointer :)
Anyhow cramming the whole interpreter in one huge function too has 
downsides (and hopelessly inefficent register allocation is one).
And you still access most locals via stack pointer. And then there got 
to be something beyond locals - 'context', that is still passed from one 
bytecode execution to another.


BTW Globals are actually quite expansive to access anyway, this was a 
mock example.


--
Dmitry Olshansky


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Alex Rønne Petersen

On 02-06-2013 19:58, Dmitry Olshansky wrote:

02-Jun-2013 21:49, Alex Rønne Petersen пишет:

On 02-06-2013 19:43, Dmitry Olshansky wrote:

02-Jun-2013 20:48, Alex Rønne Petersen пишет:

On 02-06-2013 10:52, Dmitry Olshansky wrote:

01-Jun-2013 20:13, Timon Gehr пишет:

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to
bring it
up again (because I'm going to be writing a threaded-code
interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch
statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's
essential to
write a threaded-code interpreter.



I'd also like to see this.



Same here.

Though I believe a way to force tail-call can support the same use
case
also helping functional programming.

Say:

goto Call-Expression; //forced tail call

instead of:

return Call-Expression;



I'm not sure that can support threaded-code interpretation.


Why not:

alias OpCode = function void();

OpCode[] opcodes = [ opcode_1, ... ];

int pc;
...

void opcode_1()
{
 ... //pick operands do whatever
 pc = pc + num_operands; //skip over arguments
 OpCode next = cast(OpCode)bytecode[pc];
 goto next(); //this is baked-in threaded dispatch
}

void opcode_2(){ ... }

//say bytecode contains operands and addresses of fucntions
void execute(size_t[] bytecode, int pc)
{
 OpCode start  = cast(OpCode)bytecode[pc];
 pc++;
 goto start();
}

One can get away without casting if data is in a separate array.
Then this solution is perfectly safe in this limited form. Call would
only point to a function hence no problem with jumping who knows where
and less problems for optimizer.



The problem here is assuming the interpreter state can be global. Once
you make it non-global (and thus have to pass it in the goto start(...)
call) you get all the overhead of a regular function call.



One pointer to a struct MyInterpreterState. Think of it as 'this'
pointer :)


That's one load and store for every single step in the interpreter. 
Sounds harmless, but every cycle matters when every goto start(...) 
amounts to a single instruction in the program you're running.



Anyhow cramming the whole interpreter in one huge function too has
downsides (and hopelessly inefficent register allocation is one).


I'd frankly be surprised if that was the case. But without any 
investigation, it's just word against word.



And you still access most locals via stack pointer. And then there got
to be something beyond locals - 'context', that is still passed from one
bytecode execution to another.


It depends on what you mean by context. Can you elaborate?



BTW Globals are actually quite expansive to access anyway, this was a
mock example.



Sure.


Either way, evidence suggests that this style of interpreter has 
performance advantages in practice.


--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Walter Bright

On 6/2/2013 10:58 AM, Dmitry Olshansky wrote:

Anyhow cramming the whole interpreter in one huge function too has downsides
(and hopelessly inefficent register allocation is one).


This should not be the case with register allocation using live ranges.

You can also help with this by organizing declarations so the variables have the 
shortest scope.




Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Dmitry Olshansky

02-Jun-2013 22:54, Walter Bright пишет:

On 6/2/2013 10:58 AM, Dmitry Olshansky wrote:

Anyhow cramming the whole interpreter in one huge function too has
downsides
(and hopelessly inefficent register allocation is one).


This should not be the case with register allocation using live ranges.



The problem is that if every goto *get_me_new_opcode compile may have no 
idea of where it will through it with computed labels. Hence some 
conservativeness of what registers contain and reloads and in effect all 
labels end up as isolated functions. Plus a big stack frame is liability.


And then there is a fair share of obstacles in making the kind of 
threaded interpreter code optimized properly, this post I find interesting:

http://article.gmane.org/gmane.comp.lang.lua.general/75426


You can also help with this by organizing declarations so the variables
have the shortest scope.


Then splitting opcodes as functions separates their timespan even more.

--
Dmitry Olshansky


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Dmitry Olshansky

02-Jun-2013 22:25, Alex Rønne Petersen пишет:

On 02-06-2013 19:58, Dmitry Olshansky wrote:


The problem here is assuming the interpreter state can be global. Once
you make it non-global (and thus have to pass it in the goto start(...)
call) you get all the overhead of a regular function call.



One pointer to a struct MyInterpreterState. Think of it as 'this'
pointer :)


That's one load and store for every single step in the interpreter.
Sounds harmless, but every cycle matters when every goto start(...)
amounts to a single instruction in the program you're running.


Anyhow cramming the whole interpreter in one huge function too has
downsides (and hopelessly inefficent register allocation is one).


I'd frankly be surprised if that was the case. But without any
investigation, it's just word against word.


See my reply to Walter. Again I personally have not done any 
measurements but what that post by LuaJIt author makes a lot of sense to me.





And you still access most locals via stack pointer. And then there got
to be something beyond locals - 'context', that is still passed from one
bytecode execution to another.


It depends on what you mean by context. Can you elaborate?



Context is what a virtual thread represents in the global world. Any 
outside hooks into it like argv in C's main are there, function tables 
(=extensions) and whatnot.



--
Dmitry Olshansky


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Dmitry Olshansky

02-Jun-2013 22:25, Alex Rønne Petersen пишет:

On 02-06-2013 19:58, Dmitry Olshansky wrote:


One pointer to a struct MyInterpreterState. Think of it as 'this'
pointer :)


That's one load and store for every single step in the interpreter.
Sounds harmless, but every cycle matters when every goto start(...)
amounts to a single instruction in the program you're running.



If the instruction set is RISC-y then you are in trouble anyway (writing 
interpreter not in ASM). One way out is fusing multiple simple 
instructions into macro-instructions and interpreting those.



Either way, evidence suggests that this style of interpreter has
performance advantages in practice.



Agreed. Though this day I'd go for what I call bastardized JIT. In 
essense just produce a chunk of call XYZ instructions, comnditionals 
would have to have a test + jmp. And then there is 'ret' at the end of 
this chunk ~4 instructions to have to deal with. What you get is branch 
prediction done on the hardware level and no instruction counter to 
worry about.


--
Dmitry Olshansky


Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Walter Bright

On 6/2/2013 1:10 PM, Dmitry Olshansky wrote:

The problem is that if every goto *get_me_new_opcode compile may have no idea of
where it will through it with computed labels. Hence some conservativeness of
what registers contain and reloads and in effect all labels end up as isolated
functions. Plus a big stack frame is liability.

And then there is a fair share of obstacles in making the kind of threaded
interpreter code optimized properly, this post I find interesting:
http://article.gmane.org/gmane.comp.lang.lua.general/75426


Thanks for the interesting article.



Re: Labels as values and threaded-code interpretation

2013-06-02 Thread Carl Sturtivant

On Saturday, 1 June 2013 at 17:44:02 UTC, Rob T wrote:
I wonder if this sort of thing would help solve another 
implementation problem, which is implementing very fast 
coroutines without the overhead seen with similar features such 
as fibers.


Here's some evidence that the answer to your question is yes.
http://www.cs.arizona.edu/icon/jcon/gde97.pdf‎



Re: Labels as values and threaded-code interpretation

2013-06-01 Thread bearophile

Alex Rønne Petersen:

I'm sure this has been brought up before, but I feel I need to 
bring it up again (because I'm going to be writing a 
threaded-code interpreter): 
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html


This is an incredibly important extension.


This was discussed more than once, and I think it's a valuable 
improvement for certain kinds of low level D programming.


Walter says that he has not added this feature to D because it's 
useful only to write interpreters and the like. I have found it 
useful in GCC to also write very fast finite state machines to 
analyse genomic strings. But even if Walter is right, writing 
interpreters (and emulators) is an important purpose for a system 
language as D. You can't write them efficient in scripting 
languages, and even Haskell/F# are not good at all to write them. 
Languages like C/D/C++ (and few others, as later probably Rust) 
are the only few natural languages to write them.


Recently the Python C interpreter was modified and speed up 
thanks to this non-standard feature. CPython source code has two 
versions, one with computed gotos and one without, to compile it 
even if your C compiler doesn't support them or their GNU-C 
syntax.



Implementing it should be trivial as both LLVM and GCC support 
taking the address of a block. I'm sure the DMD back end could 
be extended to support it too.


Even if DMD back-end will never implement it, I think it's 
important to define it formally and officially in the D language 
and add the relative syntax to the front-end (plus a standard 
version identifier that allows to write programs that have both a 
routine that uses computed gotos and one that doesn't use them).


This has the big advantage that all future compilers that will 
want to implement it will use the same nice standard syntax. If D 
doesn't define this syntax, then maybe GDC will add it, and maybe 
later LDC will add it, and later maybe other future compilers 
will add it, and we can't be sure they will share the same syntax.


Another advantage of having this syntax in D, is that even if a 
compiler back-end doesn't support computed gotos, the program 
compiles without syntax errors when you use conditional 
compilation to disable the piece of code that uses computed gotos.


Bye,
bearophile


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread bearophile

Alex Rønne Petersen:


final switch (insn.op)
{
case imm: lbl = handle_imm; break;
case add: lbl = handle_add; break;
case sub: lbl = handle_sub; break;
// ...
case ret: lbl = handle_ret; break;


Regarding the syntax, why do you use ? Isn't a single  
enough?


case imm: lbl = handle_imm; break;

If such gotos become a natural part of the D syntax then it's not 
necessary to copy the GNU C syntax.


Bye,
bearophile


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Diggory

On Saturday, 1 June 2013 at 11:50:23 UTC, bearophile wrote:
As example the very small but fast virtual machine written in 
GNU C++ from the International Contest on Functional 
Programming 2006:


http://codepad.org/iibBeWKw

It's faster than the same code without computed gotos.

Bye,
bearophile


Would be cool if there was a platform independent way in D to 
construct a sequence of call instructions in memory. Then it 
would be possible to write a JIT compiler in the exact same style 
as that example.


It would be relatively easy to add to phobos although you'd have 
to be careful about DEP. Hmm, perhaps I will try writing 
something like this...


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread bearophile
As example the very small but fast virtual machine written in GNU 
C++ from the International Contest on Functional Programming 2006:


http://codepad.org/iibBeWKw

It's faster than the same code without computed gotos.

Bye,
bearophile


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Timon Gehr

On 06/01/2013 07:29 AM, Alex Rønne Petersen wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to bring it
up again (because I'm going to be writing a threaded-code interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's essential to
write a threaded-code interpreter.



I'd also like to see this.


The Erlang folks went through hell just to use this feature; see the 5th
Q at:
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions


The idea is to be able to write code like this:



import std.algorithm;

enum Op : ubyte
{
 imm,
 add,
 sub,
 // ...
 ret,
}

final class Insn
{
 Op op;
 size_t[] args;
 void* lbl;
 Insn next;
}

final class State
{
 Insn pc;
 size_t[64] regs;
}

size_t interp(Insn[] code)
{
 // Set up the instruction stream with label addresses
 // the first time that it is executed. Label addresses
 // are stable, so we only do this once.

 foreach (insn; code.filter!(x = !x.lbl)())
 {
 void* lbl;

 with (Op)
 {
 final switch (insn.op)
 {
 case imm: lbl = handle_imm; break;
 case add: lbl = handle_add; break;
 case sub: lbl = handle_sub; break;
 // ...
 case ret: lbl = handle_ret; break;
 }
 }

 insn.lbl = lbl;
 }
...


(This must be supported at compile time!)



Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Timon Gehr

On 06/01/2013 11:43 AM, bearophile wrote:

Alex Rønne Petersen:


final switch (insn.op)
{
case imm: lbl = handle_imm; break;
case add: lbl = handle_add; break;
case sub: lbl = handle_sub; break;
// ...
case ret: lbl = handle_ret; break;


Regarding the syntax, why do you use ? Isn't a single  enough?

 case imm: lbl = handle_imm; break;

If such gotos become a natural part of the D syntax then it's not
necessary to copy the GNU C syntax.

Bye,
bearophile


D (like C) uses a different namespace for labels and symbols that are 
not labels.


This compiles today:

void main(){
int foo;
foo: auto b = foo;
}



Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Adam D. Ruppe

On Saturday, 1 June 2013 at 16:13:18 UTC, Timon Gehr wrote:

I'd also like to see this.


Me too. Last time this came up I said no since there's some 
inline asm hacks you can do but turns out those hacks suck in 
capability and appearance compared to just having this feature.


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Rob T

On Saturday, 1 June 2013 at 16:23:11 UTC, Adam D. Ruppe wrote:

On Saturday, 1 June 2013 at 16:13:18 UTC, Timon Gehr wrote:

I'd also like to see this.


Me too. Last time this came up I said no since there's some 
inline asm hacks you can do but turns out those hacks suck in 
capability and appearance compared to just having this feature.


I wonder if this sort of thing would help solve another 
implementation problem, which is implementing very fast 
coroutines without the overhead seen with similar features such 
as fibers.


Some discussion here:
http://forum.dlang.org/thread/pcexsqbwefjueyyly...@forum.dlang.org

--rt


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Alex Rønne Petersen

On 01-06-2013 15:17, Diggory wrote:

On Saturday, 1 June 2013 at 11:50:23 UTC, bearophile wrote:

As example the very small but fast virtual machine written in GNU C++
from the International Contest on Functional Programming 2006:

http://codepad.org/iibBeWKw

It's faster than the same code without computed gotos.

Bye,
bearophile


Would be cool if there was a platform independent way in D to construct
a sequence of call instructions in memory. Then it would be possible
to write a JIT compiler in the exact same style as that example.


At that point, you just want a simple in-memory assembler. Think asmjit.



It would be relatively easy to add to phobos although you'd have to be
careful about DEP. Hmm, perhaps I will try writing something like this...


Probably a bit too domain-specific (not to say it isn't useful).

--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Alex Rønne Petersen

On 01-06-2013 07:41, Manu wrote:

GCC has a non-standard extension to do this, I've used it to get great
performance wins writing emulators. I love this feature, love to see it
in D!


Yes, it's basically essential for high-perf interpreters. It's a feature 
that a systems language like D must have.




On 1 Jun 2013 15:30, Alex Rønne Petersen a...@lycus.org
mailto:a...@lycus.org wrote:

Hi,

I'm sure this has been brought up before, but I feel I need to bring
it up again (because I'm going to be writing a threaded-code
interpreter):
http://gcc.gnu.org/onlinedocs/__gcc/Labels-as-Values.html
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension. The final switch
statement is not a replacement because it doesn't allow the
programmer to store a label address directly into a code stream,
which is what's essential to write a threaded-code interpreter.

The Erlang folks went through hell just to use this feature; see the
5th Q at:

http://www.erlang.org/doc/__installation_guide/INSTALL-__WIN32.html#Frequently-Asked-__Questions

http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions

The idea is to be able to write code like this:



import std.algorithm;

enum Op : ubyte
{
 imm,
 add,
 sub,
 // ...
 ret,
}

final class Insn
{
 Op op;
 size_t[] args;
 void* lbl;
 Insn next;
}

final class State
{
 Insn pc;
 size_t[64] regs;
}

size_t interp(Insn[] code)
{
 // Set up the instruction stream with label addresses
 // the first time that it is executed. Label addresses
 // are stable, so we only do this once.

 foreach (insn; code.filter!(x = !x.lbl)())
 {
 void* lbl;

 with (Op)
 {
 final switch (insn.op)
 {
 case imm: lbl = handle_imm; break;
 case add: lbl = handle_add; break;
 case sub: lbl = handle_sub; break;
 // ...
 case ret: lbl = handle_ret; break;
 }
 }

 insn.lbl = lbl;
 }

 // Start interpreting the entry instruction.

 auto state = new State;
 state.pc = code[0];
 goto *state.pc.lbl;

 // Interpreter logic follows...

 // The whole point is to avoid unnecessary function
 // calls, so we use a mixin template for this stuff.
 enum advance = state.pc = state.pc.next; ~
goto *state.pc.lbl;;

 handle_imm:
 {
 state.regs[state.pc.args[0]] = state.pc.args[1];
 mixin(advance);
 }

 handle_add:
 {
 state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]]
+ state.regs[state.pc.args[2]];
 mixin(advance);
 }

 handle_sub:
 {
 state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]]
- state.regs[state.pc.args[2]];
 mixin(advance);
 }

 // ...

 handle_ret:
 {
 return state.regs[state.pc.args[0]];
 }

 assert(false);
}



Notice that there are no function calls going on. Just plain jumps
all the way through. This is a technique that many real world
interpreters use to get significant speedups, and I for one think we
desperately need it.

Implementing it should be trivial as both LLVM and GCC support
taking the address of a block. I'm sure the DMD back end could be
extended to support it too.

--
Alex Rønne Petersen
a...@alexrp.com mailto:a...@alexrp.com / a...@lycus.org
mailto:a...@lycus.org
http://alexrp.com / http://lycus.org




--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Alex Rønne Petersen

On 01-06-2013 11:43, bearophile wrote:

Alex Rønne Petersen:


final switch (insn.op)
{
case imm: lbl = handle_imm; break;
case add: lbl = handle_add; break;
case sub: lbl = handle_sub; break;
// ...
case ret: lbl = handle_ret; break;


Regarding the syntax, why do you use ? Isn't a single  enough?

 case imm: lbl = handle_imm; break;

If such gotos become a natural part of the D syntax then it's not
necessary to copy the GNU C syntax.

Bye,
bearophile


I just used the GNU C syntax because I was familiar with it. I don't 
particularly care how it ends up looking in D. But Timon makes a good 
point about namespaces.


--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Alex Rønne Petersen

On 01-06-2013 09:59, bearophile wrote:

Alex Rønne Petersen:


I'm sure this has been brought up before, but I feel I need to bring
it up again (because I'm going to be writing a threaded-code
interpreter): http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

This is an incredibly important extension.


This was discussed more than once, and I think it's a valuable
improvement for certain kinds of low level D programming.

Walter says that he has not added this feature to D because it's useful
only to write interpreters and the like. I have found it useful in GCC
to also write very fast finite state machines to analyse genomic
strings. But even if Walter is right, writing interpreters (and
emulators) is an important purpose for a system language as D. You can't
write them efficient in scripting languages, and even Haskell/F# are not
good at all to write them. Languages like C/D/C++ (and few others, as
later probably Rust) are the only few natural languages to write them.

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and 
essentialness) of this feature. I'm very close to just writing most of 
the interpreter in C over a triviality like this.






Implementing it should be trivial as both LLVM and GCC support taking
the address of a block. I'm sure the DMD back end could be extended to
support it too.


Even if DMD back-end will never implement it, I think it's important to
define it formally and officially in the D language and add the relative
syntax to the front-end (plus a standard version identifier that allows
to write programs that have both a routine that uses computed gotos and
one that doesn't use them).

This has the big advantage that all future compilers that will want to
implement it will use the same nice standard syntax. If D doesn't define
this syntax, then maybe GDC will add it, and maybe later LDC will add
it, and later maybe other future compilers will add it, and we can't be
sure they will share the same syntax.

Another advantage of having this syntax in D, is that even if a compiler
back-end doesn't support computed gotos, the program compiles without
syntax errors when you use conditional compilation to disable the piece
of code that uses computed gotos.


Yes, good points.



Bye,
bearophile



--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread bearophile

Timon Gehr:

D (like C) uses a different namespace for labels and symbols 
that are not labels.


This compiles today:

void main(){
int foo;
foo: auto b = foo;
}


On a related topic I wrote this:
http://d.puremagic.com/issues/show_bug.cgi?id=4902

Bye,
bearophile


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Andrej Mitrovic
On 6/1/13, Timon Gehr timon.g...@gmx.ch wrote:
 D (like C) uses a different namespace for labels and symbols that are
 not labels.

Perhaps for experimenting purposes (before resorting to language
changes), a trait could be introduced. E.g.:

 void main(){
  int foo;
  foo: auto b = __traits(gotoAddr, foo);
 }

And if it's successful, we add language support.


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Walter Bright

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and essentialness) of
this feature. I'm very close to just writing most of the interpreter in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add it as an 
extension.


Also, such a construct could not be made @safe. The trouble is you could pass 
those addresses anywhere, and goto them from anywhere.


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Alex Rønne Petersen

On 02-06-2013 05:18, Andrej Mitrovic wrote:

On 6/1/13, Timon Gehr timon.g...@gmx.ch wrote:

D (like C) uses a different namespace for labels and symbols that are
not labels.


Perhaps for experimenting purposes (before resorting to language
changes), a trait could be introduced. E.g.:


void main(){
  int foo;
  foo: auto b = __traits(gotoAddr, foo);
}


And if it's successful, we add language support.



You need a way to jump to an arbitrary address too.

--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-06-01 Thread Alex Rønne Petersen

On 02-06-2013 06:49, Walter Bright wrote:

On 6/1/2013 7:35 PM, Alex Rønne Petersen wrote:

On 01-06-2013 09:59, bearophile wrote:

Recently the Python C interpreter was modified and speed up thanks to
this non-standard feature. CPython source code has two versions, one
with computed gotos and one without, to compile it even if your C
compiler doesn't support them or their GNU-C syntax.


I don't think there's any question as to the usefulness (and
essentialness) of
this feature. I'm very close to just writing most of the interpreter
in C over a
triviality like this.


To be pedantic, C and C++ don't have that feature. Some compilers add it
as an extension.


I know, I just don't always remember to type out GNU C instead of C.



Also, such a construct could not be made @safe. The trouble is you could
pass those addresses anywhere, and goto them from anywhere.


I don't particularly care about its safety (its insanely unsafe). It's 
all about performance with a feature like this.


--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Labels as values and threaded-code interpretation

2013-05-31 Thread Alex Rønne Petersen

Hi,

I'm sure this has been brought up before, but I feel I need to bring it 
up again (because I'm going to be writing a threaded-code interpreter): 
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html


This is an incredibly important extension. The final switch statement is 
not a replacement because it doesn't allow the programmer to store a 
label address directly into a code stream, which is what's essential to 
write a threaded-code interpreter.


The Erlang folks went through hell just to use this feature; see the 5th 
Q at: 
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions


The idea is to be able to write code like this:



import std.algorithm;

enum Op : ubyte
{
imm,
add,
sub,
// ...
ret,
}

final class Insn
{
Op op;
size_t[] args;
void* lbl;
Insn next;
}

final class State
{
Insn pc;
size_t[64] regs;
}

size_t interp(Insn[] code)
{
// Set up the instruction stream with label addresses
// the first time that it is executed. Label addresses
// are stable, so we only do this once.

foreach (insn; code.filter!(x = !x.lbl)())
{
void* lbl;

with (Op)
{
final switch (insn.op)
{
case imm: lbl = handle_imm; break;
case add: lbl = handle_add; break;
case sub: lbl = handle_sub; break;
// ...
case ret: lbl = handle_ret; break;
}
}

insn.lbl = lbl;
}

// Start interpreting the entry instruction.

auto state = new State;
state.pc = code[0];
goto *state.pc.lbl;

// Interpreter logic follows...

// The whole point is to avoid unnecessary function
// calls, so we use a mixin template for this stuff.
enum advance = state.pc = state.pc.next; ~
   goto *state.pc.lbl;;

handle_imm:
{
state.regs[state.pc.args[0]] = state.pc.args[1];
mixin(advance);
}

handle_add:
{
state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] + 
state.regs[state.pc.args[2]];

mixin(advance);
}

handle_sub:
{
state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] - 
state.regs[state.pc.args[2]];

mixin(advance);
}

// ...

handle_ret:
{
return state.regs[state.pc.args[0]];
}

assert(false);
}



Notice that there are no function calls going on. Just plain jumps all 
the way through. This is a technique that many real world interpreters 
use to get significant speedups, and I for one think we desperately need it.


Implementing it should be trivial as both LLVM and GCC support taking 
the address of a block. I'm sure the DMD back end could be extended to 
support it too.


--
Alex Rønne Petersen
a...@alexrp.com / a...@lycus.org
http://alexrp.com / http://lycus.org


Re: Labels as values and threaded-code interpretation

2013-05-31 Thread Manu
GCC has a non-standard extension to do this, I've used it to get great
performance wins writing emulators. I love this feature, love to see it in
D!
On 1 Jun 2013 15:30, Alex Rønne Petersen a...@lycus.org wrote:

 Hi,

 I'm sure this has been brought up before, but I feel I need to bring it up
 again (because I'm going to be writing a threaded-code interpreter):
 http://gcc.gnu.org/onlinedocs/**gcc/Labels-as-Values.htmlhttp://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

 This is an incredibly important extension. The final switch statement is
 not a replacement because it doesn't allow the programmer to store a label
 address directly into a code stream, which is what's essential to write a
 threaded-code interpreter.

 The Erlang folks went through hell just to use this feature; see the 5th Q
 at: http://www.erlang.org/doc/**installation_guide/INSTALL-**
 WIN32.html#Frequently-Asked-**Questionshttp://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions

 The idea is to be able to write code like this:

 

 import std.algorithm;

 enum Op : ubyte
 {
 imm,
 add,
 sub,
 // ...
 ret,
 }

 final class Insn
 {
 Op op;
 size_t[] args;
 void* lbl;
 Insn next;
 }

 final class State
 {
 Insn pc;
 size_t[64] regs;
 }

 size_t interp(Insn[] code)
 {
 // Set up the instruction stream with label addresses
 // the first time that it is executed. Label addresses
 // are stable, so we only do this once.

 foreach (insn; code.filter!(x = !x.lbl)())
 {
 void* lbl;

 with (Op)
 {
 final switch (insn.op)
 {
 case imm: lbl = handle_imm; break;
 case add: lbl = handle_add; break;
 case sub: lbl = handle_sub; break;
 // ...
 case ret: lbl = handle_ret; break;
 }
 }

 insn.lbl = lbl;
 }

 // Start interpreting the entry instruction.

 auto state = new State;
 state.pc = code[0];
 goto *state.pc.lbl;

 // Interpreter logic follows...

 // The whole point is to avoid unnecessary function
 // calls, so we use a mixin template for this stuff.
 enum advance = state.pc = state.pc.next; ~
goto *state.pc.lbl;;

 handle_imm:
 {
 state.regs[state.pc.args[0]] = state.pc.args[1];
 mixin(advance);
 }

 handle_add:
 {
 state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] +
 state.regs[state.pc.args[2]];
 mixin(advance);
 }

 handle_sub:
 {
 state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] -
 state.regs[state.pc.args[2]];
 mixin(advance);
 }

 // ...

 handle_ret:
 {
 return state.regs[state.pc.args[0]];
 }

 assert(false);
 }

 

 Notice that there are no function calls going on. Just plain jumps all the
 way through. This is a technique that many real world interpreters use to
 get significant speedups, and I for one think we desperately need it.

 Implementing it should be trivial as both LLVM and GCC support taking the
 address of a block. I'm sure the DMD back end could be extended to support
 it too.

 --
 Alex Rønne Petersen
 a...@alexrp.com / a...@lycus.org
 http://alexrp.com / http://lycus.org