two design questions

2011-02-06 Thread spir

Hello D-istos,


I am currenty implementing a kind of lexing toolkit. First time I do that. 
Below are design questions on the topic. Also, I would like to know whether you 
think such a module would be useful for th community od D programmers. And for 
which advantages, knowing that D directly link to C lexers like flex (I have 
some ideas on the question, indeed).



1. Lexeme types

Lexemes types defined by client code need to bring at least 2 pieces of 
information
* a code representing the type
* a regex format (string)

If I decide type codes to be strings, then we get a very nice format in source 
for morphologies:

string[2][] morphology = [
[ SPC ,   `[\ \t\n]*` ],
[ ASSIGN ,`=` ],
[ integer ,   `[\+\-]?[1-9]+*` ],
...
];
A side advantage beeing that writing out a morphology or a single lexeme type 
bring a meaningful name (instead of a clueless nominal number: 
http://en.wikipedia.org/wiki/Nominal_number).


But: using strings as type codes is obviously a useless overload from the 
strict point-of-view of functionality; codes just need to be unique, thus a 
plain enum of uints or even ubytes used as nominals is a correct choice.
If I choose uint codes, then lexeme types must be structs (or else tuples, but 
they're worse). In this case, I can then take the opportunity to add a mode 
field. Which would give eg:

 LexemeType[] morphology = [
LexemeType( SPC ,   `[\ \t\n]*` , SKIP ),
LexemeType( ASSIGN ,`=` , MARK ),
LexemeType( integer ,   `[\+\-]?[1-9]+*` ,DATA ),
...
];
Far more annoying to write, ain't it?

Also, a 'mode' field is nearly useless as of now:
(1) for MARKs, I cannot avoid reading the slice yet anyway (see above), thus 
why not store it since there is no (additional) copy
(2) for SKIP'ped lexemes, I have a practical alternative allowing the parser to 
skip optional and non-significant tokens (still a bit stupid to record tokens 
just to ignore them later, but...)



2. match actions

I do not have any match action system yet. Actually, a 'mode' field would 
implement kinds of very special predefined actions. Is more really needed? 
Typically, in my experience of parsing, useful match actions happen at a higher 
level, namely at parsing rather than lexing time:

* Structure the AST, eg discard MARK tokens or flatten lists.
* Handle data, eg convert numbers or drop '' from strings.
Structural actions can only be handled by the parser, I guess, while operations 
on data are nicely placed in dedicated Node type constructors.
What kinds of typical actions would really be useful for client code, at lexing 
time, especially ones allowing parser simplification? (else as handling SKIP 
tokens)



External points of view warmly welcome :-)

Denis
--
_
vita es estrany
spir.wikidot.com



three little issues

2011-02-06 Thread spir

Hello,

Here are three little issues I faced while implemented a lexing toolkit (see 
other post).


1. Regex match

Let us say there are three natures or modes of lexeme:
* SKIP: not even kept, just matched and dropped (eg optional spacing)
* MARK: kept, but slice is irrelevant data (eg all kinds of punctuation)
* DATA: slice is necessary data (eg constant value or symbol)

For the 2 first cases, I still need to get the size ot the matched slice, to 
advance in source by the corresponding offset. Is there a way to get this 
information without fetching the slice by calling hit()?


Also, I would like to know when Regex.hit() copies or slices.


2. reference escape

This is a little enigma I face somewhere in this module. Say S is a struct:
...
auto s = S(data);
return s;
This code is obvioulsy wrong and the compiler gently warns me about that. But 
the variant below is allowed and more, seems towork fine:

return (S(data);
For me, both versions are synonym. Thus, why does the compiler accept the 
latter and why does it work? Any later use to the returned struct (recorded in 
an array) should miserably fail with segfault. (*)
Or is it that the compiler recognises the idiom and implicitely allocates the 
struct outside the local stack?

Example:

struct S { int i; }

S* newS (int i) {
if (i  0)
return null;
//  auto s = S(i);
//  return s;  // Error: escaping reference to local s
return (S(i));
}

unittest {
int[] ints = [2, -2, 1, -1, 0];
S[] structs;
foreach (i ; ints) {
auto p = newS(i);
if (p) {
structs ~= *p;  // explicite deref!
}
}
assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

How can this work?


3. implicite deref

But there is even more mysterious for me: if I first access the struct before 
recording it like in:


unittest {
int[] ints = [2, -2, 1, -1, 0];
S[] structs;
foreach (i ; ints) {
auto p = newS(i);
if (p) {
write (p.i,' ');// implicite deref!
structs ~= *p;  // explicite deref!
}
}
assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the final assert fails!? But the written i's are correct (2 1 0).
Worse, if I exchange the two deref lines:

unittest {
int[] ints = [2, -2, 1, -1, 0];
S[] structs;
foreach (i ; ints) {
auto p = newS(i);
if (p) {
structs ~= *p;  // explicite deref!
write (p.i,' ');// implicite deref!
}
}
assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the assertion passes, but the written integers are wrong (looks like 
either garbage or an address, repeated 3 times, eg: 134518949 134518949 
134518949; successive runs constantly produce the same value).



Denis
--
_
vita es estrany
spir.wikidot.com



Maximum Number of Threads?

2011-02-06 Thread d coder
Greetings

  Is there a limit on the maximum number of threads that can be
spawned? Or does it just depend on the value in
/proc/sys/kernel/threads-max on a linux system?

Regards
- Cherry


Re: three little issues

2011-02-06 Thread bearophile
spir:

 2. reference escape
 3. implicite deref

The situation is easy to understand once you know how generally what a stack 
frame is and how C functions are called:
http://en.wikipedia.org/wiki/Stack_frame
The D call stack is a contiguous-allocated backwards-single-linked list of 
differently-sized records, each record is a stack frame, and the whole data 
structure is of course managed as stack :-)

When you have similar doubts I also suggest you to take a look at the asm DMD 
generates. Writing asm requires some work, but reading a bit of asm is 
something you may learn in few days or even one day.

Before a D function starts, a stack frame is created. It will contain your 
stack-allocated struct instance. When the function ends its stack frame is 
destroyed virtually by moving a stack pointer, so the struct may be overwritten 
by other things, like by a call to writeln that creates many stack frames. If 
the stack frame is not overwritten and you save by *value* the stack contents, 
you have successfully saved your data in the array of S, but accessing 
virtually deleted data in the stack is a bad practice to avoid.

Bye,
bearophile


Re: New to D: parse a binary file

2011-02-06 Thread Jesse Phillips
scottrick Wrote:

 T[] rawRead(T)(T[] buffer);
 
 I understand that T is generic type, but I am not sure of the
 meaning of the (T) after the method name.

That T is defining the symbol to represent the generic type. It can have more 
than one and D provides other things like aliases... Another way to write that 
function (I may get something wrong here but give it a shot) is:

template(T) {
T[] rawRead(T[] buffer);
}


Re: three little issues

2011-02-06 Thread spir

On 02/06/2011 02:13 PM, bearophile wrote:

Before a D function starts, a stack frame is created. It will contain your 
stack-allocated struct instance. When the function ends its stack frame is 
destroyed virtually by moving a stack pointer, so the struct may be overwritten 
by other things, like by a call to writeln that creates many stack frames. If 
the stack frame is not overwritten and you save by*value*  the stack contents, 
you have successfully saved your data in the array of S, but accessing 
virtually deleted data in the stack is a bad practice to avoid.


Right, I may be successful to store by value as you say, before the frame is 
overwritten, and so-to-say by chance.


But this does not explain why the compiler refuses:
// 1
auto s = S(data);
return s;
and accepts:
// 2
return (S(data));
or does it?
What are the supposed differences in semantics or behaviour, if any? For 
(naive) me, these 2 pieces of code are exactly synonym (and I would be happy 
with the compiler suppressing s in 1 or instead creating an intermediate var in 
2, whatever it judges better).


I have a third version, in the case where I need to check something in s before 
returning its address:

// 3
auto p = (S(data));
if ((*p).check())
return null;
return p;
(This is just a synopsis). I need to write it that way, else it's refused. (I 
mean I cannot first have an s var explicitely, check on it directly, then take 
it's address as return value).


Another use case is where S's are in an array, else both the synopsis and the 
solution are analog to the last code above. Since they are struct values, I use 
a pointer to avoid a useless local copy. What do you think of this idiom? Is it 
common? Is it good at all?

Real code:

/** AST Node constructed from lexeme of type typeCode, if any,
at current position in lexeme stream --else null.
Node's constructor must expect the lexeme's slice as (only) input.
*/
Node node (Node) (string typeCode) if (is(Node == class)) {
// Avoid useless local copy of lexeme by using pointer
// (instead of local struct variable).
Lexeme* pointer = (this.lexemes[this.cursor]);
if ((*pointer).typeCode == typeCode) {
++ this.cursor;
return new Node((*pointer).slice);
}
return null;
}

My spontaneous version of this code would indeed be:

Node node (Node) (string typeCode) if (is(Node == class)) {
Lexeme lexeme = this.lexemes[this.cursor];
if (lexeme.typeCode == typeCode) {
++ this.cursor;
return new Node(lexeme.slice);
}
return null;
}

The aim is avoiding copying pieces of the (plain text) source when lexing, 
parsing, constructing the AST. If I'm right in analysing my app as of now, I 
have, thank to D's view slices, exactly 0 copy from source text to AST. 
Meaning even AST nodes which hold a piece of the source text (strings, symbols, 
maybe more) actually have a view of the very original source. In any other 
language (or is it in my pr2vious coding style?), I would have copied at the 
very minimum once. Even in a dynamic language (which strings are indeed 
ref'ed), to create the first slice (in the sense of substring).

Thank you, Walter!

Denis
--
_
vita es estrany
spir.wikidot.com



Re: three little issues

2011-02-06 Thread bearophile
spir:

 But this does not explain why the compiler refuses:
   // 1
   auto s = S(data);
   return s;
 and accepts:
   // 2
   return (S(data));
 or does it?

Accepting the second is a bug in the escape analysis done by the front-end, I 
think. 
But see also what Walter has invented here:
http://en.wikipedia.org/wiki/Return_value_optimization


 What are the supposed differences in semantics or behaviour, if any?

Regarding what the compiler actually does, take a look at the produced asm.


 (This is just a synopsis). I need to write it that way, else it's refused.

Don't return pointers to memory present in to-be-deleted stack frames.

Bye,
bearophile


Debugging D?

2011-02-06 Thread Sean Eskapp
Are debug symbols compiled with -gc stored in a separate file? Visual Studio
refuses to debug my things, and windbg seems to be remarkably unhelpful.


Re: New to D: parse a binary file

2011-02-06 Thread scottrick
Thanks, your post was very helpful.  Two more questions (probably
related):

Where is the function 'format' defined?  Also, what is that 'unittest'
block?  It compiles fine as is, but if I refer to format outside of
unittest, it will not compile.  Also, if I compile and run your
example, it doesn't do anything, since main() is empty?

Thanks again,


Re: New to D: parse a binary file

2011-02-06 Thread bearophile
scottrick:

 Where is the function 'format' defined?

You need to add at the top of the module:
import std.conv: format;
Or:
import std.conv;


 Also, what is that 'unittest' block?  It compiles fine as is, but if I refer 
 to format outside of
 unittest, it will not compile.  Also, if I compile and run your
 example, it doesn't do anything, since main() is empty?

It's an block of unit tests :-) Currently in your program they are not even 
compiled, so the format is not used. To run the unit tests you need to compile 
with -unittest compiler switch (with DMD).
See also:
http://www.digitalmars.com/d/2.0/unittest.html

Bye,
bearophile


std.concurrency immutable classes...

2011-02-06 Thread Tomek Sowiński
... doesn't work.

class C {}
thisTid.send(new immutable(C)());
receive((immutable C) { writeln(got it!); });

This throws: 
core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): 
immutable(C)

And when I go for Rebindable, I get Aliases to mutable thread-local data not 
allowed..

Is there anything I can do?

Overall, I think that's another reason D needs native tail const badly. 
Polymorphic classes are close to being second class citizens just as soon const 
enters. :(

-- 
Tomek



Re: Debugging D?

2011-02-06 Thread Trass3r
Are debug symbols compiled with -gc stored in a separate file? Visual  
Studio refuses to debug my things


Nope.
Plus you need to use cv2pdb to debug with Visual


Re: New to D: parse a binary file

2011-02-06 Thread Mafi

Am 06.02.2011 19:38, schrieb Jesse Phillips:

scottrick Wrote:


T[] rawRead(T)(T[] buffer);

I understand that T is generic type, but I am not sure of the
meaning of the (T) after the method name.


That T is defining the symbol to represent the generic type. It can have more 
than one and D provides other things like aliases... Another way to write that 
function (I may get something wrong here but give it a shot) is:

template(T) {
 T[] rawRead(T[] buffer);
}

I think you meant

template(T) rawRead{
T[] rawRead(T[] buffer);
}

'template' defines a namespace which is normally accessed like

templ!(parameters).member;
templ!(parameters).memberfunc(parameters);

Because the template and it's member are called identically this member 
is accessed autoatically (the eponymous-trick). If it's a function you 
call it like that:


templfunc!(compiletimeparam)(param);

The compile time parameters can left out, if these can be derived from 
the normal parameters' type.


templfun(param);

Voilla! You have a completely transparent templated func.

Mafi


Re: Debugging D?

2011-02-06 Thread Robert Clipsham

On 06/02/11 20:29, Sean Eskapp wrote:

Are debug symbols compiled with -gc stored in a separate file? Visual Studio
refuses to debug my things, and windbg seems to be remarkably unhelpful.


I suggest you take a look at VisualD if you're using visual studio, it 
will handle converting debug info so that visual studio can understand 
it, and give you some intellisense.


http://www.dsource.org/projects/visuald

--
Robert
http://octarineparrot.com/


Re: Debugging D?

2011-02-06 Thread Sean Eskapp
== Quote from Robert Clipsham (rob...@octarineparrot.com)'s article
 On 06/02/11 20:29, Sean Eskapp wrote:
  Are debug symbols compiled with -gc stored in a separate file? Visual Studio
  refuses to debug my things, and windbg seems to be remarkably unhelpful.
 I suggest you take a look at VisualD if you're using visual studio, it
 will handle converting debug info so that visual studio can understand
 it, and give you some intellisense.
 http://www.dsource.org/projects/visuald

I'm using VisualD already, but the project is configured using Makefiles, and I
don't want to go through the hassle of changing project configs in two 
locations.
Is there any way to still get Visual Studio debugging information if it's a
makefile project?


Re: Maximum Number of Threads?

2011-02-06 Thread Jonathan M Davis
On Sunday 06 February 2011 05:05:24 d coder wrote:
 Greetings
 
   Is there a limit on the maximum number of threads that can be
 spawned? Or does it just depend on the value in
 /proc/sys/kernel/threads-max on a linux system?

Barring any bugs which manage to keep threads alive too long, it's going to be 
OS dependent. core.thread (which std.concurrency.spawn uses) uses pthreads on 
Linux. However, there _are_ currently some bugs with regards to spawned threads 
not terminating however, at least some of which have been fixed in the git 
repository (changes are in both druntime and phobos) but haven't been released 
yet. So, I don't know how successfully you can use spawn at the moment. 
Personally, I've had major problems with it due to bugs related to threads not 
terminating. Other people have used it successfully. Some of those bugs _are_ 
finally being fixed however, and hopefully spawn will work much better in the 
next 
release. Regardless, the max number of threads should be system dependent.

- Jonathan M Davis


Re: std.concurrency immutable classes...

2011-02-06 Thread Jonathan M Davis
On Sunday 06 February 2011 13:55:36 Tomek Sowiński wrote:
 ... doesn't work.
 
 class C {}
 thisTid.send(new immutable(C)());
 receive((immutable C) { writeln(got it!); });
 
 This throws:
 core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285):
 immutable(C)
 
 And when I go for Rebindable, I get Aliases to mutable thread-local data
 not allowed..
 
 Is there anything I can do?
 
 Overall, I think that's another reason D needs native tail const badly.
 Polymorphic classes are close to being second class citizens just as soon
 const enters. :(

Open a bug report on it. There are a number of bugs relating to const and 
immutable - some of which are library-related and some of which need to be 
fixed 
in the compiler. Until many of those get sorted out, I wouldn't expect using 
immutable classes to work very well beyond some very basic cases.

- Jonathan M Davis


Starting with D

2011-02-06 Thread Julius
Hi there,
i'm all new to D but not new to programming in general.
I'd like to try D but i didn't find a nice tutorial yet.
I don't want to read a whole book, I just want to get the basics so I can start.
Can you help me find something like that?

Best regards, Julius


Re: Starting with D

2011-02-06 Thread Caligo
On Sun, Feb 6, 2011 at 5:35 PM, Julius n0r3...@web.de wrote:

 Hi there,
 i'm all new to D but not new to programming in general.
 I'd like to try D but i didn't find a nice tutorial yet.
 I don't want to read a whole book, I just want to get the basics so I can
 start.
 Can you help me find something like that?

 Best regards, Julius


I say get the book.  The D Programming Language is a great book.  If you are
a university student you'll probably be able to read it for free.  I finally
got my hard-copy, and it's great.


Re: std.concurrency immutable classes...

2011-02-06 Thread Michel Fortin

On 2011-02-06 16:55:36 -0500, Tomek Sowiński j...@ask.me said:


... doesn't work.

class C {}
thisTid.send(new immutable(C)());
receive((immutable C) { writeln(got it!); });

This throws: 
core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): 
immutable(C)


And when I go for Rebindable, I get Aliases to mutable thread-local 
data not allowed..


Is there anything I can do?

Overall, I think that's another reason D needs native tail const badly. 
Polymorphic classes are close to being second class citizens just as 
soon const enters. :(


I just made this pull request today:
https://github.com/D-Programming-Language/dmd/pull/

If you want to test it, you're very welcome. Here is my development 
branch for this feature:

https://github.com/michelf/dmd/tree/const-object-ref

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: std.concurrency immutable classes...

2011-02-06 Thread Michel Fortin

On 2011-02-06 20:09:56 -0500, Michel Fortin michel.for...@michelf.com said:


I just made this pull request today:
https://github.com/D-Programming-Language/dmd/pull/


That should have been:
https://github.com/D-Programming-Language/dmd/pull/3

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Why non-@property functions don't need parentheses

2011-02-06 Thread %u
Hi,

I was wondering, why are we allowed to omit parentheses when calling functions
with no arguments, when they are not @properties? Is there a good reason for
relaxing the language rules like this?

Thanks!


Re: Why non-@property functions don't need parentheses

2011-02-06 Thread Jonathan M Davis
On Sunday 06 February 2011 20:38:29 %u wrote:
 Hi,
 
 I was wondering, why are we allowed to omit parentheses when calling
 functions with no arguments, when they are not @properties? Is there a
 good reason for relaxing the language rules like this?

Because the compiler is not in line with TDPL yet. It used to be that @property 
didn't even exist and _all_ functions which returned a value and took no 
parameters could be used as a getter property and _all_ functions which 
returned 
void and took a single value could be used as a setter property. @property was 
added so that it could be better controlled. However, while @property has been 
added, the compiler has yet to be changed to enforce that @property functions 
are called without parens and that non-@property functions are called with 
them. 
It will be fixed at some point, but it hasn't been yet.

- Jonathan M Davis


Re: Why non-@property functions don't need parentheses

2011-02-06 Thread Simen kjaeraas

%u wfunct...@hotmail.com wrote:


Hi,

I was wondering, why are we allowed to omit parentheses when calling  
functions
with no arguments, when they are not @properties? Is there a good reason  
for

relaxing the language rules like this?


This behavior is deprecated, but other features have had a higher priority
than removing features that do not cause big trouble. :p


--
Simen


Re: Using D libs in C

2011-02-06 Thread GreatEmerald
All right, found out how to make it compile. There are two ways:

1) Using DMD for the D part, DMC for the C part and combining them. This is
the batch file I use for that:

dmd -c -lib dpart.d
dmc cpart.c dpart.lib phobos.lib

2) Using DMD for the D part, DMC for the C part, DMD for combining them again:

dmd -c -lib dpart.d
dmc -c cpart.c
dmd cpart.obj dpart.lib phobos.lib

The first method gives me a FIXLIB warning but compiles OK, the second is
nicely silent, thus I prefer the second one. Plus it should work in Linux as
well. I'm going to try that shortly.


Re: Using D libs in C

2011-02-06 Thread GreatEmerald
Hmm, no, it won't work right on Linux for some reason. This is the output:

/usr/lib/gcc/x86_64-linux-gnu/4.3.2/../../../libphobos2.a(deh2_4e7_525.o): In
function `_D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable':
src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0x4):
undefined reference to `_deh_beg'
src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0xc):
undefined reference to `_deh_beg'
src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0x13):
undefined reference to `_deh_end'
src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0x37):
undefined reference to `_deh_end'
collect2: ld returned 1 exit status
--- errorlevel 1

The shell script I'm using to compile it is:

#!/bin/sh
dmd -m32 -c -lib dpart.d
gcc -m32 -c cpart.c
dmd -m32 cpart.o dpart.a /usr/lib/libphobos2.a

(Although it appears that you don't need to explicitly link with libphobos2, it
does it automatically... and fails with the above error.) Any ideas about what 
the
error means?