two design questions
Hello D-istos, I am currenty implementing a kind of lexing toolkit. First time I do that. Below are design questions on the topic. Also, I would like to know whether you think such a module would be useful for th community od D programmers. And for which advantages, knowing that D directly link to C lexers like flex (I have some ideas on the question, indeed). 1. Lexeme types Lexemes types defined by client code need to bring at least 2 pieces of information * a code representing the type * a regex format (string) If I decide type codes to be strings, then we get a very nice format in source for morphologies: string[2][] morphology = [ [ SPC , `[\ \t\n]*` ], [ ASSIGN ,`=` ], [ integer , `[\+\-]?[1-9]+*` ], ... ]; A side advantage beeing that writing out a morphology or a single lexeme type bring a meaningful name (instead of a clueless nominal number: http://en.wikipedia.org/wiki/Nominal_number). But: using strings as type codes is obviously a useless overload from the strict point-of-view of functionality; codes just need to be unique, thus a plain enum of uints or even ubytes used as nominals is a correct choice. If I choose uint codes, then lexeme types must be structs (or else tuples, but they're worse). In this case, I can then take the opportunity to add a mode field. Which would give eg: LexemeType[] morphology = [ LexemeType( SPC , `[\ \t\n]*` , SKIP ), LexemeType( ASSIGN ,`=` , MARK ), LexemeType( integer , `[\+\-]?[1-9]+*` ,DATA ), ... ]; Far more annoying to write, ain't it? Also, a 'mode' field is nearly useless as of now: (1) for MARKs, I cannot avoid reading the slice yet anyway (see above), thus why not store it since there is no (additional) copy (2) for SKIP'ped lexemes, I have a practical alternative allowing the parser to skip optional and non-significant tokens (still a bit stupid to record tokens just to ignore them later, but...) 2. match actions I do not have any match action system yet. Actually, a 'mode' field would implement kinds of very special predefined actions. Is more really needed? Typically, in my experience of parsing, useful match actions happen at a higher level, namely at parsing rather than lexing time: * Structure the AST, eg discard MARK tokens or flatten lists. * Handle data, eg convert numbers or drop '' from strings. Structural actions can only be handled by the parser, I guess, while operations on data are nicely placed in dedicated Node type constructors. What kinds of typical actions would really be useful for client code, at lexing time, especially ones allowing parser simplification? (else as handling SKIP tokens) External points of view warmly welcome :-) Denis -- _ vita es estrany spir.wikidot.com
three little issues
Hello, Here are three little issues I faced while implemented a lexing toolkit (see other post). 1. Regex match Let us say there are three natures or modes of lexeme: * SKIP: not even kept, just matched and dropped (eg optional spacing) * MARK: kept, but slice is irrelevant data (eg all kinds of punctuation) * DATA: slice is necessary data (eg constant value or symbol) For the 2 first cases, I still need to get the size ot the matched slice, to advance in source by the corresponding offset. Is there a way to get this information without fetching the slice by calling hit()? Also, I would like to know when Regex.hit() copies or slices. 2. reference escape This is a little enigma I face somewhere in this module. Say S is a struct: ... auto s = S(data); return s; This code is obvioulsy wrong and the compiler gently warns me about that. But the variant below is allowed and more, seems towork fine: return (S(data); For me, both versions are synonym. Thus, why does the compiler accept the latter and why does it work? Any later use to the returned struct (recorded in an array) should miserably fail with segfault. (*) Or is it that the compiler recognises the idiom and implicitely allocates the struct outside the local stack? Example: struct S { int i; } S* newS (int i) { if (i 0) return null; // auto s = S(i); // return s; // Error: escaping reference to local s return (S(i)); } unittest { int[] ints = [2, -2, 1, -1, 0]; S[] structs; foreach (i ; ints) { auto p = newS(i); if (p) { structs ~= *p; // explicite deref! } } assert ( structs == [S(2), S(1), S(0)] ); // pass! } How can this work? 3. implicite deref But there is even more mysterious for me: if I first access the struct before recording it like in: unittest { int[] ints = [2, -2, 1, -1, 0]; S[] structs; foreach (i ; ints) { auto p = newS(i); if (p) { write (p.i,' ');// implicite deref! structs ~= *p; // explicite deref! } } assert ( structs == [S(2), S(1), S(0)] ); // pass! } ...then the final assert fails!? But the written i's are correct (2 1 0). Worse, if I exchange the two deref lines: unittest { int[] ints = [2, -2, 1, -1, 0]; S[] structs; foreach (i ; ints) { auto p = newS(i); if (p) { structs ~= *p; // explicite deref! write (p.i,' ');// implicite deref! } } assert ( structs == [S(2), S(1), S(0)] ); // pass! } ...then the assertion passes, but the written integers are wrong (looks like either garbage or an address, repeated 3 times, eg: 134518949 134518949 134518949; successive runs constantly produce the same value). Denis -- _ vita es estrany spir.wikidot.com
Maximum Number of Threads?
Greetings Is there a limit on the maximum number of threads that can be spawned? Or does it just depend on the value in /proc/sys/kernel/threads-max on a linux system? Regards - Cherry
Re: three little issues
spir: 2. reference escape 3. implicite deref The situation is easy to understand once you know how generally what a stack frame is and how C functions are called: http://en.wikipedia.org/wiki/Stack_frame The D call stack is a contiguous-allocated backwards-single-linked list of differently-sized records, each record is a stack frame, and the whole data structure is of course managed as stack :-) When you have similar doubts I also suggest you to take a look at the asm DMD generates. Writing asm requires some work, but reading a bit of asm is something you may learn in few days or even one day. Before a D function starts, a stack frame is created. It will contain your stack-allocated struct instance. When the function ends its stack frame is destroyed virtually by moving a stack pointer, so the struct may be overwritten by other things, like by a call to writeln that creates many stack frames. If the stack frame is not overwritten and you save by *value* the stack contents, you have successfully saved your data in the array of S, but accessing virtually deleted data in the stack is a bad practice to avoid. Bye, bearophile
Re: New to D: parse a binary file
scottrick Wrote: T[] rawRead(T)(T[] buffer); I understand that T is generic type, but I am not sure of the meaning of the (T) after the method name. That T is defining the symbol to represent the generic type. It can have more than one and D provides other things like aliases... Another way to write that function (I may get something wrong here but give it a shot) is: template(T) { T[] rawRead(T[] buffer); }
Re: three little issues
On 02/06/2011 02:13 PM, bearophile wrote: Before a D function starts, a stack frame is created. It will contain your stack-allocated struct instance. When the function ends its stack frame is destroyed virtually by moving a stack pointer, so the struct may be overwritten by other things, like by a call to writeln that creates many stack frames. If the stack frame is not overwritten and you save by*value* the stack contents, you have successfully saved your data in the array of S, but accessing virtually deleted data in the stack is a bad practice to avoid. Right, I may be successful to store by value as you say, before the frame is overwritten, and so-to-say by chance. But this does not explain why the compiler refuses: // 1 auto s = S(data); return s; and accepts: // 2 return (S(data)); or does it? What are the supposed differences in semantics or behaviour, if any? For (naive) me, these 2 pieces of code are exactly synonym (and I would be happy with the compiler suppressing s in 1 or instead creating an intermediate var in 2, whatever it judges better). I have a third version, in the case where I need to check something in s before returning its address: // 3 auto p = (S(data)); if ((*p).check()) return null; return p; (This is just a synopsis). I need to write it that way, else it's refused. (I mean I cannot first have an s var explicitely, check on it directly, then take it's address as return value). Another use case is where S's are in an array, else both the synopsis and the solution are analog to the last code above. Since they are struct values, I use a pointer to avoid a useless local copy. What do you think of this idiom? Is it common? Is it good at all? Real code: /** AST Node constructed from lexeme of type typeCode, if any, at current position in lexeme stream --else null. Node's constructor must expect the lexeme's slice as (only) input. */ Node node (Node) (string typeCode) if (is(Node == class)) { // Avoid useless local copy of lexeme by using pointer // (instead of local struct variable). Lexeme* pointer = (this.lexemes[this.cursor]); if ((*pointer).typeCode == typeCode) { ++ this.cursor; return new Node((*pointer).slice); } return null; } My spontaneous version of this code would indeed be: Node node (Node) (string typeCode) if (is(Node == class)) { Lexeme lexeme = this.lexemes[this.cursor]; if (lexeme.typeCode == typeCode) { ++ this.cursor; return new Node(lexeme.slice); } return null; } The aim is avoiding copying pieces of the (plain text) source when lexing, parsing, constructing the AST. If I'm right in analysing my app as of now, I have, thank to D's view slices, exactly 0 copy from source text to AST. Meaning even AST nodes which hold a piece of the source text (strings, symbols, maybe more) actually have a view of the very original source. In any other language (or is it in my pr2vious coding style?), I would have copied at the very minimum once. Even in a dynamic language (which strings are indeed ref'ed), to create the first slice (in the sense of substring). Thank you, Walter! Denis -- _ vita es estrany spir.wikidot.com
Re: three little issues
spir: But this does not explain why the compiler refuses: // 1 auto s = S(data); return s; and accepts: // 2 return (S(data)); or does it? Accepting the second is a bug in the escape analysis done by the front-end, I think. But see also what Walter has invented here: http://en.wikipedia.org/wiki/Return_value_optimization What are the supposed differences in semantics or behaviour, if any? Regarding what the compiler actually does, take a look at the produced asm. (This is just a synopsis). I need to write it that way, else it's refused. Don't return pointers to memory present in to-be-deleted stack frames. Bye, bearophile
Debugging D?
Are debug symbols compiled with -gc stored in a separate file? Visual Studio refuses to debug my things, and windbg seems to be remarkably unhelpful.
Re: New to D: parse a binary file
Thanks, your post was very helpful. Two more questions (probably related): Where is the function 'format' defined? Also, what is that 'unittest' block? It compiles fine as is, but if I refer to format outside of unittest, it will not compile. Also, if I compile and run your example, it doesn't do anything, since main() is empty? Thanks again,
Re: New to D: parse a binary file
scottrick: Where is the function 'format' defined? You need to add at the top of the module: import std.conv: format; Or: import std.conv; Also, what is that 'unittest' block? It compiles fine as is, but if I refer to format outside of unittest, it will not compile. Also, if I compile and run your example, it doesn't do anything, since main() is empty? It's an block of unit tests :-) Currently in your program they are not even compiled, so the format is not used. To run the unit tests you need to compile with -unittest compiler switch (with DMD). See also: http://www.digitalmars.com/d/2.0/unittest.html Bye, bearophile
std.concurrency immutable classes...
... doesn't work. class C {} thisTid.send(new immutable(C)()); receive((immutable C) { writeln(got it!); }); This throws: core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): immutable(C) And when I go for Rebindable, I get Aliases to mutable thread-local data not allowed.. Is there anything I can do? Overall, I think that's another reason D needs native tail const badly. Polymorphic classes are close to being second class citizens just as soon const enters. :( -- Tomek
Re: Debugging D?
Are debug symbols compiled with -gc stored in a separate file? Visual Studio refuses to debug my things Nope. Plus you need to use cv2pdb to debug with Visual
Re: New to D: parse a binary file
Am 06.02.2011 19:38, schrieb Jesse Phillips: scottrick Wrote: T[] rawRead(T)(T[] buffer); I understand that T is generic type, but I am not sure of the meaning of the (T) after the method name. That T is defining the symbol to represent the generic type. It can have more than one and D provides other things like aliases... Another way to write that function (I may get something wrong here but give it a shot) is: template(T) { T[] rawRead(T[] buffer); } I think you meant template(T) rawRead{ T[] rawRead(T[] buffer); } 'template' defines a namespace which is normally accessed like templ!(parameters).member; templ!(parameters).memberfunc(parameters); Because the template and it's member are called identically this member is accessed autoatically (the eponymous-trick). If it's a function you call it like that: templfunc!(compiletimeparam)(param); The compile time parameters can left out, if these can be derived from the normal parameters' type. templfun(param); Voilla! You have a completely transparent templated func. Mafi
Re: Debugging D?
On 06/02/11 20:29, Sean Eskapp wrote: Are debug symbols compiled with -gc stored in a separate file? Visual Studio refuses to debug my things, and windbg seems to be remarkably unhelpful. I suggest you take a look at VisualD if you're using visual studio, it will handle converting debug info so that visual studio can understand it, and give you some intellisense. http://www.dsource.org/projects/visuald -- Robert http://octarineparrot.com/
Re: Debugging D?
== Quote from Robert Clipsham (rob...@octarineparrot.com)'s article On 06/02/11 20:29, Sean Eskapp wrote: Are debug symbols compiled with -gc stored in a separate file? Visual Studio refuses to debug my things, and windbg seems to be remarkably unhelpful. I suggest you take a look at VisualD if you're using visual studio, it will handle converting debug info so that visual studio can understand it, and give you some intellisense. http://www.dsource.org/projects/visuald I'm using VisualD already, but the project is configured using Makefiles, and I don't want to go through the hassle of changing project configs in two locations. Is there any way to still get Visual Studio debugging information if it's a makefile project?
Re: Maximum Number of Threads?
On Sunday 06 February 2011 05:05:24 d coder wrote: Greetings Is there a limit on the maximum number of threads that can be spawned? Or does it just depend on the value in /proc/sys/kernel/threads-max on a linux system? Barring any bugs which manage to keep threads alive too long, it's going to be OS dependent. core.thread (which std.concurrency.spawn uses) uses pthreads on Linux. However, there _are_ currently some bugs with regards to spawned threads not terminating however, at least some of which have been fixed in the git repository (changes are in both druntime and phobos) but haven't been released yet. So, I don't know how successfully you can use spawn at the moment. Personally, I've had major problems with it due to bugs related to threads not terminating. Other people have used it successfully. Some of those bugs _are_ finally being fixed however, and hopefully spawn will work much better in the next release. Regardless, the max number of threads should be system dependent. - Jonathan M Davis
Re: std.concurrency immutable classes...
On Sunday 06 February 2011 13:55:36 Tomek Sowiński wrote: ... doesn't work. class C {} thisTid.send(new immutable(C)()); receive((immutable C) { writeln(got it!); }); This throws: core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): immutable(C) And when I go for Rebindable, I get Aliases to mutable thread-local data not allowed.. Is there anything I can do? Overall, I think that's another reason D needs native tail const badly. Polymorphic classes are close to being second class citizens just as soon const enters. :( Open a bug report on it. There are a number of bugs relating to const and immutable - some of which are library-related and some of which need to be fixed in the compiler. Until many of those get sorted out, I wouldn't expect using immutable classes to work very well beyond some very basic cases. - Jonathan M Davis
Starting with D
Hi there, i'm all new to D but not new to programming in general. I'd like to try D but i didn't find a nice tutorial yet. I don't want to read a whole book, I just want to get the basics so I can start. Can you help me find something like that? Best regards, Julius
Re: Starting with D
On Sun, Feb 6, 2011 at 5:35 PM, Julius n0r3...@web.de wrote: Hi there, i'm all new to D but not new to programming in general. I'd like to try D but i didn't find a nice tutorial yet. I don't want to read a whole book, I just want to get the basics so I can start. Can you help me find something like that? Best regards, Julius I say get the book. The D Programming Language is a great book. If you are a university student you'll probably be able to read it for free. I finally got my hard-copy, and it's great.
Re: std.concurrency immutable classes...
On 2011-02-06 16:55:36 -0500, Tomek Sowiński j...@ask.me said: ... doesn't work. class C {} thisTid.send(new immutable(C)()); receive((immutable C) { writeln(got it!); }); This throws: core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): immutable(C) And when I go for Rebindable, I get Aliases to mutable thread-local data not allowed.. Is there anything I can do? Overall, I think that's another reason D needs native tail const badly. Polymorphic classes are close to being second class citizens just as soon const enters. :( I just made this pull request today: https://github.com/D-Programming-Language/dmd/pull/ If you want to test it, you're very welcome. Here is my development branch for this feature: https://github.com/michelf/dmd/tree/const-object-ref -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: std.concurrency immutable classes...
On 2011-02-06 20:09:56 -0500, Michel Fortin michel.for...@michelf.com said: I just made this pull request today: https://github.com/D-Programming-Language/dmd/pull/ That should have been: https://github.com/D-Programming-Language/dmd/pull/3 -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Why non-@property functions don't need parentheses
Hi, I was wondering, why are we allowed to omit parentheses when calling functions with no arguments, when they are not @properties? Is there a good reason for relaxing the language rules like this? Thanks!
Re: Why non-@property functions don't need parentheses
On Sunday 06 February 2011 20:38:29 %u wrote: Hi, I was wondering, why are we allowed to omit parentheses when calling functions with no arguments, when they are not @properties? Is there a good reason for relaxing the language rules like this? Because the compiler is not in line with TDPL yet. It used to be that @property didn't even exist and _all_ functions which returned a value and took no parameters could be used as a getter property and _all_ functions which returned void and took a single value could be used as a setter property. @property was added so that it could be better controlled. However, while @property has been added, the compiler has yet to be changed to enforce that @property functions are called without parens and that non-@property functions are called with them. It will be fixed at some point, but it hasn't been yet. - Jonathan M Davis
Re: Why non-@property functions don't need parentheses
%u wfunct...@hotmail.com wrote: Hi, I was wondering, why are we allowed to omit parentheses when calling functions with no arguments, when they are not @properties? Is there a good reason for relaxing the language rules like this? This behavior is deprecated, but other features have had a higher priority than removing features that do not cause big trouble. :p -- Simen
Re: Using D libs in C
All right, found out how to make it compile. There are two ways: 1) Using DMD for the D part, DMC for the C part and combining them. This is the batch file I use for that: dmd -c -lib dpart.d dmc cpart.c dpart.lib phobos.lib 2) Using DMD for the D part, DMC for the C part, DMD for combining them again: dmd -c -lib dpart.d dmc -c cpart.c dmd cpart.obj dpart.lib phobos.lib The first method gives me a FIXLIB warning but compiles OK, the second is nicely silent, thus I prefer the second one. Plus it should work in Linux as well. I'm going to try that shortly.
Re: Using D libs in C
Hmm, no, it won't work right on Linux for some reason. This is the output: /usr/lib/gcc/x86_64-linux-gnu/4.3.2/../../../libphobos2.a(deh2_4e7_525.o): In function `_D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable': src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0x4): undefined reference to `_deh_beg' src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0xc): undefined reference to `_deh_beg' src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0x13): undefined reference to `_deh_end' src/rt/deh2.d:(.text._D2rt4deh213__eh_finddataFPvZPS2rt4deh213DHandlerTable+0x37): undefined reference to `_deh_end' collect2: ld returned 1 exit status --- errorlevel 1 The shell script I'm using to compile it is: #!/bin/sh dmd -m32 -c -lib dpart.d gcc -m32 -c cpart.c dmd -m32 cpart.o dpart.a /usr/lib/libphobos2.a (Although it appears that you don't need to explicitly link with libphobos2, it does it automatically... and fails with the above error.) Any ideas about what the error means?