Re: struct vs class for a simple token in my d lexer

Roman D. Boiko Mon, 14 May 2012 10:28:51 -0700

On Monday, 14 May 2012 at 17:05:17 UTC, Dmitry Olshansky wrote:

On 14.05.2012 19:10, Roman D. Boiko wrote:
(Subj.) I'm in doubt which to choose for my case, but this isa generic
question.
http://forum.dlang.org/post/[email protected]
Cross-posting here. I would appreciate any feedback. (Whetherto reply
in this or that thread is up to you.) Thanks
On 14.05.2012 19:10, Roman D. Boiko wrote:
Oops, sorry I meant to post to NG only :) Repost:

Clearly you are puting too much pressure on Token.

In my mind it should be real simple:

struct Token{
    uint col, line;
uint flags;//indicated info about token, serves as bothtype tag and flag set;//indicates proper type once token was cooked (like "31.415926"-> 3.145926e1) i.e. values are calculated
    union {
        string     chars;
        float     f_val;
        double     d_val;
        uint     uint_val;
        long     long_val;
        ulnog     ulong_val;
        //... anything else you may need (8 bytes are plenty)
    }//even then you may use up to 12bytes
    //total size == 24 or 20
};

Where:
Each raw token at start has chars == slice of characters intext (or if not UTF-8 source = copy of source). Except forkeywords and operators.Cooking is a process of calculating constant values andsuch (say populating symbols table will putting symbol id intotoken instead of leaving string slice). Do it on the fly orafter the whole source - let the user choose.
Value types have nice property of being real fast, I suggestyou to do at least some syntetic tests before going withref-based stuff. Pushing 4 word is cheap, indirection never is.Classes also have hidden mutex _monitor_ field so using 'class'can be best described as suicide.
Yet you may go with freelist of tokens (leaving them asstructs). It's an old and proven way.
About row/col - if this stuff is encoded into Finite Automation(and you sure want to do something like DFA) it comes outalmost at no cost.The only disadvantage is complicating DFA tables with someirregularities of "true Unicode line ending sequences". It'smore of nuisance then real problem though.
P.S. if you real bend on performance, I suggest to run sparateAho-Corassic-style thing for keywords, it would greatlysimplify (=speed up) DFA structure if keywords are nothardcoded into automation.

Thanks Dmitry, I think I'll need to contact you privately aboutsome details later.

Re: struct vs class for a simple token in my d lexer

Reply via email to