subject:"Re\: Lexer in D"

Re: Lexer in D

2013-03-24 Thread Namespace


On Sunday, 24 March 2013 at 22:16:45 UTC, Brian Schott wrote:

On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote:
Have no time - looks interesting but I highly doubt it can do 
what is promised. Still take time to do a small formal 
announcement for beta.


Yes you're right, it's going to take some time before all 
would work in general.
In my small test runs, it has, however, been proven, but D is 
also incredibly complex.
I will also stop very soon to work on it, but maybe the code / 
the idea inspire someone else to write something more mature 
for the D community.
This is also the reason why I'm asking in this fizzle thread 
for a feedback. :)
But once you've taken a look on it, I would be happy if you 
could tell me what should be improved next time.


I have similar goals for the Dscananer tool, so I may steal 
some of your ideas.


I'm glad to hear that. If I can help let me know, would be happy 
to gain more knowledge in this area.

And I would be glad if some of my ideas become reality.

Re: Lexer in D

2013-03-24 Thread Brian Schott


On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote:
Have no time - looks interesting but I highly doubt it can do 
what is promised. Still take time to do a small formal 
announcement for beta.


Yes you're right, it's going to take some time before all would 
work in general.
In my small test runs, it has, however, been proven, but D is 
also incredibly complex.
I will also stop very soon to work on it, but maybe the code / 
the idea inspire someone else to write something more mature 
for the D community.
This is also the reason why I'm asking in this fizzle thread 
for a feedback. :)
But once you've taken a look on it, I would be happy if you 
could tell me what should be improved next time.


I have similar goals for the Dscananer tool, so I may steal some 
of your ideas.

Re: Lexer in D

2013-03-24 Thread Namespace

Have no time - looks interesting but I highly doubt it can do 
what is promised. Still take time to do a small formal 
announcement for beta.


Yes you're right, it's going to take some time before all would 
work in general.
In my small test runs, it has, however, been proven, but D is 
also incredibly complex.
I will also stop very soon to work on it, but maybe the code / 
the idea inspire someone else to write something more mature for 
the D community.
This is also the reason why I'm asking in this fizzle thread for 
a feedback. :)
But once you've taken a look on it, I would be happy if you could 
tell me what should be improved next time.

Re: Lexer in D

2013-03-24 Thread Dmitry Olshansky


24-Mar-2013 20:08, Namespace пишет:

On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote:

On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote:

On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:

So far, my lexer is pure exercise.
But my goal is actually to filter variables and functions, to see if
they are ever used in the code.


I'm almost finished. In my few tests, my little parser detects
function, struct and variable declarations and expressions. It can
resolve rvalue references (by creating temporary variables) and
detects unused/underused variables. But it is still a beta. But it
makes a lot of fun. :D


Moved to https://github.com/Dgame/DAT/


It's time for a beta. Would be nice if someone takes a look at DAT or
try it.
I'm sure that there are still some bugs, but maybe this little
recreational fun is indeed useful for one or the other.
I had a lot of fun.
It's tested with 'simple.d', std.stdio and std.datetime.


Have no time - looks interesting but I highly doubt it can do what is 
promised. Still take time to do a small formal announcement for beta.


Reusing the old and fizzled out thread of a D lexer to discuss a tool 
created on top of it is bound to have very little feedback.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-24 Thread Namespace


On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote:

On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote:

On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:

So far, my lexer is pure exercise.
But my goal is actually to filter variables and functions, to 
see if they are ever used in the code.


I'm almost finished. In my few tests, my little parser detects 
function, struct and variable declarations and expressions. It 
can resolve rvalue references (by creating temporary 
variables) and detects unused/underused variables. But it is 
still a beta. But it makes a lot of fun. :D


Moved to https://github.com/Dgame/DAT/


It's time for a beta. Would be nice if someone takes a look at 
DAT or try it.
I'm sure that there are still some bugs, but maybe this little 
recreational fun is indeed useful for one or the other.

I had a lot of fun.
It's tested with 'simple.d', std.stdio and std.datetime.

Re: Lexer in D

2013-03-23 Thread Namespace


On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote:

On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:

So far, my lexer is pure exercise.
But my goal is actually to filter variables and functions, to 
see if they are ever used in the code.


I'm almost finished. In my few tests, my little parser detects 
function, struct and variable declarations and expressions. It 
can resolve rvalue references (by creating temporary variables) 
and detects unused/underused variables. But it is still a beta. 
But it makes a lot of fun. :D


Moved to https://github.com/Dgame/DAT/

Re: Lexer in D

2013-03-21 Thread Namespace


On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:

So far, my lexer is pure exercise.
But my goal is actually to filter variables and functions, to 
see if they are ever used in the code.


I'm almost finished. In my few tests, my little parser detects 
function, struct and variable declarations and expressions. It 
can resolve rvalue references (by creating temporary variables) 
and detects unused/underused variables. But it is still a beta. 
But it makes a lot of fun. :D

Re: Lexer in D

2013-03-15 Thread Namespace


So you got rid of array creation. About time ;)


Yes, that was the only way to get closer to your measured time. :D

Re: Lexer in D

2013-03-15 Thread Namespace


So far, my lexer is pure exercise.
But my goal is actually to filter variables and functions, to see 
if they are ever used in the code.

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky


15-Mar-2013 01:17, Artur Skawina пишет:

On 03/14/13 18:24, Dmitry Olshansky wrote:

14-Mar-2013 21:20, Namespace пишет:

Yes, and it's really fun. :)
By the way, the code can be found here:
https://github.com/Dgame/Puzzle/blob/master/DLexer.d
Maybe some of you have an idea how I could improve anything.
Last trace.log is also there:
https://github.com/Dgame/Puzzle/blob/master/trace.log
My goal is about 20 msecs. :D


And my is 15 with Dscanner. But it's stuck at around 21-22.
But both of us can just cheat and buy a newer PC :)


What are you both actually measuring? IOW what happens with the
resulting tokens?

artur



I'm counting and/or filtering out specific tokens. The latter is a tiny 
bit slower.


The other options of Dscanner are to generate e.g. HTML highlighting for 
code or measure token frequencies. Haven't timed those as their speed 
doesn't entirely depend on lexer as is.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-14 Thread Artur Skawina

On 03/14/13 18:24, Dmitry Olshansky wrote:
> 14-Mar-2013 21:20, Namespace пишет:
>> Yes, and it's really fun. :)
>> By the way, the code can be found here:
>> https://github.com/Dgame/Puzzle/blob/master/DLexer.d
>> Maybe some of you have an idea how I could improve anything.
>> Last trace.log is also there:
>> https://github.com/Dgame/Puzzle/blob/master/trace.log
>> My goal is about 20 msecs. :D
> 
> And my is 15 with Dscanner. But it's stuck at around 21-22.
> But both of us can just cheat and buy a newer PC :)

What are you both actually measuring? IOW what happens with the
resulting tokens?

artur

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky


14-Mar-2013 21:20, Namespace пишет:

Yes, and it's really fun. :)
By the way, the code can be found here:
https://github.com/Dgame/Puzzle/blob/master/DLexer.d
Maybe some of you have an idea how I could improve anything.
Last trace.log is also there:
https://github.com/Dgame/Puzzle/blob/master/trace.log
My goal is about 20 msecs. :D


So you got rid of array creation. About time ;)

--
Dmitry Olshansky

Re: Lexer in D

2013-03-14 Thread Namespace


Yes, and it's really fun. :)
By the way, the code can be found here: 
https://github.com/Dgame/Puzzle/blob/master/DLexer.d

Maybe some of you have an idea how I could improve anything.
Last trace.log is also there: 
https://github.com/Dgame/Puzzle/blob/master/trace.log

My goal is about 20 msecs. :D

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky


14-Mar-2013 21:20, Namespace пишет:

Yes, and it's really fun. :)
By the way, the code can be found here:
https://github.com/Dgame/Puzzle/blob/master/DLexer.d
Maybe some of you have an idea how I could improve anything.
Last trace.log is also there:
https://github.com/Dgame/Puzzle/blob/master/trace.log
My goal is about 20 msecs. :D


And my is 15 with Dscanner. But it's stuck at around 21-22.
But both of us can just cheat and buy a newer PC :)

So what we need to count is the number of syscalls + cycles spent in 
user mode to have a meaningful metric.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-14 Thread Dmitry Olshansky


14-Mar-2013 01:19, Namespace пишет:

Array has an horrible code...

I decided to start from scratch.
So I took a close look at the lexer of dmd and took over the basic
functionality.
Result for std.datetime (without profiling so far):



It's getting better!



Total time (ms): 5971.66
Repetitions: 200
Sample mode: 28 (101 ocurrences)
Median time: 28.89
Avg time   : 29.8583
Std dev.   : 3.37222
Minimum: 27.958
Maximum: 70.583
95% conf.int.  : [23.2489, 36.4677]  e = 6.60943
99% conf.int.  : [21.172, 38.5446]  e = 8.68627
EstimatedAvg95%: [29.3909, 30.3257]  e = 0.467357
EstimatedAvg99%: [29.2441, 30.4725]  e = 0.614212




--
Dmitry Olshansky

Re: Lexer in D

2013-03-13 Thread Namespace


Array has an horrible code...

I decided to start from scratch.
So I took a close look at the lexer of dmd and took over the 
basic functionality.

Result for std.datetime (without profiling so far):


Total time (ms): 5971.66
Repetitions: 200
Sample mode: 28 (101 ocurrences)
Median time: 28.89
Avg time   : 29.8583
Std dev.   : 3.37222
Minimum: 27.958
Maximum: 70.583
95% conf.int.  : [23.2489, 36.4677]  e = 6.60943
99% conf.int.  : [21.172, 38.5446]  e = 8.68627
EstimatedAvg95%: [29.3909, 30.3257]  e = 0.467357
EstimatedAvg99%: [29.2441, 30.4725]  e = 0.614212

Re: Lexer in D

2013-03-11 Thread Namespace


Profile?

Also, note that appender is a tool for appending stuff *into an 
array*. If you don't actually need an array at the end, then 
you could give Array a roll?


I don't think it will necessarily do better, but it doesn't 
hurt to give it a roll.


My profile output looks like this: http://dpaste.1azy.net/0409f0bc

Since I have already discussed most of the stuff in this thread, 
I do not know what I could still improve on the Lexer.
But I have not yet thought of Array. Because my Appender use 
already malloc and free, I could learn something by the 
implementation of 'Array'. Thanks for the suggestion.

Re: Lexer in D

2013-03-11 Thread monarch_dodra


On Monday, 11 March 2013 at 16:37:00 UTC, Namespace wrote:

Seems to be the wrong point of optimizations.


Profile?

Also, note that appender is a tool for appending stuff *into an 
array*. If you don't actually need an array at the end, then you 
could give Array a roll?


I don't think it will necessarily do better, but it doesn't hurt 
to give it a roll.

Re: Lexer in D

2013-03-11 Thread Namespace


extends if possible.

http://dpaste.dzfl.pl/f86e5db6

import core.stdc.stdlib;
void main()
{
  auto p1 = malloc(14);
  auto p2 = realloc(p1, 15);
  assert(p1 is p2);
}


Cool, nice to know.

Currently I have this benchmark:

Without memory reserve:


Total time (ms): 6779.16
Repetitions: 200
Sample mode: 33 (73 ocurrences)
Median time: 33.689
Avg time   : 33.8958
Std dev.   : 1.86754
Minimum: 31.414
Maximum: 52.237
95% conf.int.  : [30.2355, 37.5561]  e = 3.66031
99% conf.int.  : [29.0853, 38.7063]  e = 4.81047
EstimatedAvg95%: [33.637, 34.1546]  e = 0.258823
EstimatedAvg99%: [33.5556, 34.2359]  e = 0.340151

With memory reserve:


Total time (ms): 5945.1
Repetitions: 200
Sample mode: 29 (68 ocurrences)
Median time: 29.3215
Avg time   : 29.7255
Std dev.   : 2.083
Minimum: 27.458
Maximum: 45.32
95% conf.int.  : [25.6429, 33.8081]  e = 4.08261
99% conf.int.  : [24.36, 35.0909]  e = 5.36546
EstimatedAvg95%: [29.4368, 30.0142]  e = 0.288684
EstimatedAvg99%: [29.3461, 30.1049]  e = 0.379396

But I does not gain that much:

Total time (ms): 10833
Repetitions: 200
Sample mode: 53 (105 ocurrences)
Median time: 53.848
Avg time   : 54.1652
Std dev.   : 1.72681
Minimum: 52.118
Maximum: 69.703
95% conf.int.  : [50.7807, 57.5497]  e = 3.38448
99% conf.int.  : [49.7172, 58.6131]  e = 4.44796
EstimatedAvg95%: [53.9259, 54.4045]  e = 0.239319
EstimatedAvg99%: [53.8507, 54.4797]  e = 0.314518

Seems to be the wrong point of optimizations.

Re: Lexer in D

2013-03-11 Thread monarch_dodra


On Monday, 11 March 2013 at 15:56:37 UTC, Namespace wrote:

On Monday, 11 March 2013 at 15:51:58 UTC, David wrote:
But I do not think you can extend a memory block with C 
functions.


What about realloc?


That's what I use. But it moves your old memory into the new 
allocated. Or proves it first if your current memory block 
could be extended?


extends if possible.

http://dpaste.dzfl.pl/f86e5db6

import core.stdc.stdlib;
void main()
{
  auto p1 = malloc(14);
  auto p2 = realloc(p1, 15);
  assert(p1 is p2);
}

Re: Lexer in D

2013-03-11 Thread Namespace


On Monday, 11 March 2013 at 15:51:58 UTC, David wrote:
But I do not think you can extend a memory block with C 
functions.


What about realloc?


That's what I use. But it moves your old memory into the new 
allocated. Or proves it first if your current memory block could 
be extended?

Re: Lexer in D

2013-03-11 Thread David

> But I do not think you can extend a memory block with C functions.

What about realloc?

Re: Lexer in D

2013-03-11 Thread Namespace


On Monday, 11 March 2013 at 15:21:31 UTC, FG wrote:

On 2013-03-11 16:11, Namespace wrote:
I've also done some benchmarks. And I wonder why my appender 
is much faster if I

enable memory reservation.


Probably just because D's Appender has a better growth scheme 
than yours and you reallocate too often. Try to compare how 
often that happens.


D's:
24 extends, 9 moves (qalloc + memcpy).

My Appender:
25 moves.
But I do not think you can extend a memory block with C functions.


I'm also very interested to hear some improvements tips.


Well, you may add a check whether alloc/realloc actually 
succeeded. :)

Maybe. :)

Re: Lexer in D

2013-03-11 Thread FG


On 2013-03-11 16:11, Namespace wrote:

I've also done some benchmarks. And I wonder why my appender is much faster if I
enable memory reservation.


Probably just because D's Appender has a better growth scheme than yours and you 
reallocate too often. Try to compare how often that happens.



I'm also very interested to hear some improvements tips.


Well, you may add a check whether alloc/realloc actually succeeded. :)

Re: Lexer in D

2013-03-11 Thread Namespace



Total time (ms): 10941.9
Repetitions: 200
Sample mode: 54 (118 ocurrences)
Median time: 54.5855
Avg time   : 54.7094
Std dev.   : 1.33935
Minimum: 52.535
Maximum: 67.206
95% conf.int.  : [52.0843, 57.3345]  e = 2.62509
99% conf.int.  : [51.2594, 58.1593]  e = 3.44995
EstimatedAvg95%: [54.5238, 54.895]  e = 0.185622
EstimatedAvg99%: [54.4654, 54.9533]  e = 0.243948

And even a bit closer.

I've also done some benchmarks. And I wonder why my appender is 
much faster if I enable memory reservation.

My Appender: http://dpaste.1azy.net/1bb34d9a
I'm also very interested to hear some improvements tips.

In each benchmark 1_000_000 times these struct is appended:

struct B {
public:
const int b;
}

Without memory reserve:

My Appender:

Total time (ms): 7638.67
Repetitions: 200
Sample mode: 38 (68 ocurrences)
Median time: 38.0545
Avg time   : 38.1934
Std dev.   : 1.56827
Minimum: 35.859
Maximum: 47.586
95% conf.int.  : [35.1196, 41.2671]  e = 3.07375
99% conf.int.  : [34.1538, 42.233]  e = 4.0396
EstimatedAvg95%: [37.976, 38.4107]  e = 0.217347
EstimatedAvg99%: [37.9077, 38.479]  e = 0.285643

D's Appender:

Total time (ms): 6972.66
Repetitions: 200
Sample mode: 34 (97 ocurrences)
Median time: 34.7795
Avg time   : 34.8633
Std dev.   : 1.41623
Minimum: 32.96
Maximum: 49.579
95% conf.int.  : [32.0876, 37.6391]  e = 2.77576
99% conf.int.  : [31.2153, 38.5113]  e = 3.64797
EstimatedAvg95%: [34.667, 35.0596]  e = 0.196276
EstimatedAvg99%: [34.6054, 35.1213]  e = 0.257951

D array:

Total time (ms): 22868.2
Repetitions: 200
Sample mode: 110 (198 ocurrences)
Median time: 113.658
Avg time   : 114.341
Std dev.   : 1.97184
Minimum: 113.011
Maximum: 133.978
95% conf.int.  : [110.477, 118.206]  e = 3.86473
99% conf.int.  : [109.262, 119.42]  e = 5.07911
EstimatedAvg95%: [114.068, 114.615]  e = 0.273277
EstimatedAvg99%: [113.982, 114.7]  e = 0.359147

With memory reserve:

My Appender:

Total time (ms): 6197.48
Repetitions: 200
Sample mode: 30 (88 ocurrences)
Median time: 30.617
Avg time   : 30.9874
Std dev.   : 2.86913
Minimum: 28.583
Maximum: 64.06
95% conf.int.  : [25.364, 36.6108]  e = 5.62339
99% conf.int.  : [23.597, 38.3778]  e = 7.39039
EstimatedAvg95%: [30.5897, 31.385]  e = 0.397634
EstimatedAvg99%: [30.4648, 31.51]  e = 0.52258

D's Appender:

Total time (ms): 6628.31
Repetitions: 200
Sample mode: 32 (104 ocurrences)
Median time: 32.6775
Avg time   : 33.1415
Std dev.   : 3.00344
Minimum: 31.506
Maximum: 57.493
95% conf.int.  : [27.2549, 39.0282]  e = 5.88663
99% conf.int.  : [25.4052, 40.8779]  e = 7.73635
EstimatedAvg95%: [32.7253, 33.5578]  e = 0.416248
EstimatedAvg99%: [32.5945, 33.6886]  e = 0.547042

D array:

Total time (ms): 22849.9
Repetitions: 200
Sample mode: 110 (198 ocurrences)
Median time: 113.025
Avg time   : 114.249
Std dev.   : 8.40906
Minimum: 112.355
Maximum: 222.982
95% conf.int.  : [97.768, 130.731]  e = 16.4815
99% conf.int.  : [92.5891, 135.91]  e = 21.6603
EstimatedAvg95%: [113.084, 115.415]  e = 1.16542
EstimatedAvg99%: [112.718, 115.781]  e = 1.53162

Re: Lexer in D

2013-03-11 Thread Namespace



Total time (ms): 11374.4
Repetitions: 200
Sample mode: 56 (89 ocurrences)
Median time: 56.312
Avg time   : 56.872
Std dev.   : 2.60698
Minimum: 53.978
Maximum: 81.602
95% conf.int.  : [51.7624, 61.9816]  e = 5.10959
99% conf.int.  : [50.1569, 63.5871]  e = 6.71514
EstimatedAvg95%: [56.5107, 57.2333]  e = 0.361303
EstimatedAvg99%: [56.3972, 57.3468]  e = 0.474832

I'm getting closer.

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky


05-Mar-2013 01:35, Brian Schott пишет:

On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote:

Brain listed graphs showing under 20 ms on his CPU. for all and <10
for every module in std except std.datetime.


https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh

I checked in the script that I use to generate the data for the graph
last night. You can copy/paste its output into a spreadsheet (using tabs
as field separators).


Great, thanks!
I'll see if I can make a plot from it with std-dev shown as well.

--
Dmitry Olshansky

Re: Lexer in D

2013-03-04 Thread Brian Schott


On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote:
Brain listed graphs showing under 20 ms on his CPU. for all and 
<10 for every module in std except std.datetime.


https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh

I checked in the script that I use to generate the data for the 
graph last night. You can copy/paste its output into a 
spreadsheet (using tabs as field separators).

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky


04-Mar-2013 23:19, Namespace пишет:

Yes, but I'm not as familiar with ranges. But I want to learn more about
them and try it then.
What do you mean, how much would entail the use of ranges compared to my
current method? 30%?


Basically you copy Token structs and sometimes reallocate the whole 
array (I guess at 60% you don't reallocate but waste time on GC.malloc 
that zeroes out memory block). My estimate is no less then 10-15%.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-04 Thread Namespace

Yes, but I'm not as familiar with ranges. But I want to learn 
more about them and try it then.
What do you mean, how much would entail the use of ranges 
compared to my current method? 30%?

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky


04-Mar-2013 22:58, Namespace пишет:

Yes, I know. But I do not think I have a chance to get even closer to
it. At least I don't know, how.


One thing that all of these lexers do (dmd's orignal and Dscanner) is 
get tokens as you need them - lazily. You would always be slower if 
taking time to append them in some array (no matter how). The idiomatic 
way as I said is to use ranges.



I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+.


Mm that's not that far away from mine. But Brain has got something even 
more powerful :)


You can easily download and test Dscanner on the your CPU so that you 
get numbers that are actually comparable.



--
Dmitry Olshansky

Re: Lexer in D

2013-03-04 Thread Namespace

Yes, I know. But I do not think I have a chance to get even 
closer to it. At least I don't know, how.

I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+.

Re: Lexer in D

2013-03-04 Thread Dmitry Olshansky


04-Mar-2013 16:17, Namespace пишет:

With reserving of 60% of the file size in memory I get a big speed boost.


Total time (ms): 17544.6
Repetitions: 200
Sample mode: 87 (121 ocurrences)
Median time: 87.3835
Avg time   : 87.7232
Std dev.   : 1.77905
Minimum: 84.238
Maximum: 101.919
95% conf.int.  : [84.2363, 91.21]  e = 3.48687
99% conf.int.  : [83.1407, 92.3057]  e = 4.58252
EstimatedAvg95%: [87.4766, 87.9697]  e = 0.246559
EstimatedAvg99%: [87.3991, 88.0472]  e = 0.324033

But I'm still away from:
Avg time   : 69.3088
http://forum.dlang.org/thread/dpdgcycrgfspcxenz...@forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org


Keep in mind the CPU you are testing on. What's you chip?

In any case Dscanner has been improved to take far far less time then 
60ms. Brain listed graphs showing under 20 ms on his CPU. for all and 
<10 for every module in std except std.datetime.



I've tried also 10%, 40% and 80% but 60% seems optimal.

And I think I reached my performance limit. I could implement the
LazyForwardRange and, if anyone can confirm the statement of char <->
dchar decoding and that the usage of ubyte would be (crucial) better, I
will try this too.





--
Dmitry Olshansky

Re: Lexer in D

2013-03-04 Thread Namespace

With reserving of 60% of the file size in memory I get a big 
speed boost.



Total time (ms): 17544.6
Repetitions: 200
Sample mode: 87 (121 ocurrences)
Median time: 87.3835
Avg time   : 87.7232
Std dev.   : 1.77905
Minimum: 84.238
Maximum: 101.919
95% conf.int.  : [84.2363, 91.21]  e = 3.48687
99% conf.int.  : [83.1407, 92.3057]  e = 4.58252
EstimatedAvg95%: [87.4766, 87.9697]  e = 0.246559
EstimatedAvg99%: [87.3991, 88.0472]  e = 0.324033

But I'm still away from:
Avg time   : 69.3088 
http://forum.dlang.org/thread/dpdgcycrgfspcxenz...@forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org


I've tried also 10%, 40% and 80% but 60% seems optimal.

And I think I reached my performance limit. I could implement the 
LazyForwardRange and, if anyone can confirm the statement of char 
<-> dchar decoding and that the usage of ubyte would be (crucial) 
better, I will try this too.

Re: Lexer in D

2013-03-04 Thread Namespace

I've installed cygwin to use the time benchmark function and that 
are the results:

$ time ./Lexer.exe

real0m0.225s
user0m0.015s
sys 0m0.046s

I also read in the std.d.lexer thread that the use of 'char' 
isn't that good for a lexer, because dmd wants to decode 'char' 
to 'dchar'. The usage of ubyte should be better. Is that right?

Re: Lexer in D

2013-03-03 Thread Namespace

It's quite amazing, with Appender I reach between 115 and 125 
msecs.
I adapted the Appender for myself and solve the problem with 
memcpy/memmove.

That works fine.
But is there any other I can do (besides the LazyForwardRange) to 
gain more performance?

Re: Lexer in D

2013-03-03 Thread Jonathan M Davis

On Sunday, March 03, 2013 17:19:25 Namespace wrote:
> But hasn't each range an array internally?

Many do, but some don't. Take std.algorithm.ioto, or the ranges in std.random 
for instance. They're generative. And if all ranges had arrays underneat the 
hood, infinite ranges wouldn't be possible except for something like 
std.algorithm.cycle. Something like

struct Range
{
@property int front() { return _i; }
void popFront() {++i}
@property bool empty { return _i != int.max; }

private int _i;
}

would be a valid range. In the case of a lexer, you generate the next token on 
each popFront call, so you consume the range of characters that you're lexing 
as you go but don't need to allocate memory for more than the current token at 
a time.

- Jonathan M Davis

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky


03-Mar-2013 20:45, Namespace пишет:

Just rip off the const why would you need it anyway?


The data in 'tokens' are final, so why they should not be const?


Then just return const(Token) like simple full const. In general I'd 
avoid marking data as bitwise const unless that is actually true. For 
all you know some algorithm up above might tweak existing token values 
directly.



What exactly is the problem if there are const member? Is this a known
problem?


It's quite old and ugly as a lot of things would need to bit-blit stuff 
over it and fail to do so.



Otherwise, I will go and investigate it.



--
Dmitry Olshansky

Re: Lexer in D

2013-03-03 Thread Namespace


Just rip off the const why would you need it anyway?


The data in 'tokens' are final, so why they should not be const?
What exactly is the problem if there are const member? Is this a 
known problem?

Otherwise, I will go and investigate it.

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky


03-Mar-2013 20:36, Namespace пишет:

On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote:

Simple - don't use array append and don't produce and array.
Just produce a lazy forward range that is iterated.

At the very least use Appender!(Token[]) it ought to be much faster.


But hasn't each range an array internally?
Or how does that work?

I try to use Appender, but this lazy forward range sounds interesting.


Appender does not work for my Token struct because I have
const/immutable members.
So I cannot add something.

Example: http://dpaste.1azy.net/21f100d3
Without 'const' it works fine.


Just rip off the const why would you need it anyway?

--
Dmitry Olshansky

Re: Lexer in D

2013-03-03 Thread Namespace


On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote:

Simple - don't use array append and don't produce and array.
Just produce a lazy forward range that is iterated.

At the very least use Appender!(Token[]) it ought to be much 
faster.


But hasn't each range an array internally?
Or how does that work?

I try to use Appender, but this lazy forward range sounds 
interesting.


Appender does not work for my Token struct because I have 
const/immutable members.

So I cannot add something.

Example: http://dpaste.1azy.net/21f100d3
Without 'const' it works fine.

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky


03-Mar-2013 20:19, Namespace пишет:

Simple - don't use array append and don't produce and array.
Just produce a lazy forward range that is iterated.

At the very least use Appender!(Token[]) it ought to be much faster.


But hasn't each range an array internally?


Of course, not or the whole range concept is meaningless.


Or how does that work?


By having a struct or generally any object that in fact works as 
generator of tokens.


popFront lexes new token, front gives the current token and empty 
indicate if there are no more tokens. See std.range for some simple ranges.


If you throw in a .save method you can replay tokenization from some 
point in the past (= soem older state, think lookahead when parsing).



I try to use Appender, but this lazy forward range sounds interesting.



--
Dmitry Olshansky

Re: Lexer in D

2013-03-03 Thread Namespace


Simple - don't use array append and don't produce and array.
Just produce a lazy forward range that is iterated.

At the very least use Appender!(Token[]) it ought to be much 
faster.


But hasn't each range an array internally?
Or how does that work?

I try to use Appender, but this lazy forward range sounds 
interesting.

Re: Lexer in D

2013-03-03 Thread Dmitry Olshansky


03-Mar-2013 18:28, Namespace пишет:

I'd repeat that I think it makes no sense to separately treat isType.
In any case there is way more to types then built-in ones and is the
job of parser (to assume types) and semantic step (to type-check and
infer).

Yes I've understood, but currently I want it so.


Another thing is to run -profile without inline to understand the
structure of time spent per each subroutine better.

I did this and I refactored a lot of my code.
Now I get 170 - 180 msecs and this is my current trace.log:
http://dpaste.1azy.net/b94b19ff

But currently I have no more ideas how I could gain more performance.
Maybe I should disable the compiler while looping through the text?
Or maybe I should allocate more space for the Token array (e.g.
toks.length = 100;)?
I hope that you or anyone other have further ideas.
But anyway, thanks for the help! :)


Simple - don't use array append and don't produce and array.
Just produce a lazy forward range that is iterated.

At the very least use Appender!(Token[]) it ought to be much faster.

--
Dmitry Olshansky

Re: Lexer in D

2013-03-03 Thread Namespace

I'd repeat that I think it makes no sense to separately treat 
isType. In any case there is way more to types then built-in 
ones and is the job of parser (to assume types) and semantic 
step (to type-check and infer).

Yes I've understood, but currently I want it so.

Another thing is to run -profile without inline to understand 
the structure of time spent per each subroutine better.

I did this and I refactored a lot of my code.
Now I get 170 - 180 msecs and this is my current trace.log:
http://dpaste.1azy.net/b94b19ff

But currently I have no more ideas how I could gain more 
performance.
Maybe I should disable the compiler while looping through the 
text?
Or maybe I should allocate more space for the Token array (e.g. 
toks.length = 100;)?

I hope that you or anyone other have further ideas.
But anyway, thanks for the help! :)

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


03-Mar-2013 03:12, Namespace пишет:

I noted it to late: decode comes from readText: readText validates your
text. I will now  use std.file.read.



That's why Walter promotes profilers. They help verify hypothesis on 
performance and constantly surprise people in what they actually find.



The new trace output is here and it seems you're right, isKeyword and
isType (yes I'd decided to separate it again and keep both in the lexer)
take a lot of time:
http://dpaste.1azy.net/0d6aff6b
  154874  106186   95403   0 pure nothrow bool
Puzzle.Lexer.isKeyword(immutable(char)[])
  159688   78020   69289   0 pure nothrow bool
Puzzle.Lexer.isType(immutable(char)[])

I never thought that. :)




--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


03-Mar-2013 03:52, Namespace пишет:

I think that thanks to your suggestions isKeyword and isType are
entirely inlined, because they aren't listet in the trace.log anymore
(http://dpaste.1azy.net/b94b19ff). The only thing which makes trouble is
the isNext function.
Maybe I should also use a StringStream (implemented as Range). What do
you think?


I'd repeat that I think it makes no sense to separately treat isType. In 
any case there is way more to types then built-in ones and is the job of 
parser (to assume types) and semantic step (to type-check and infer).


Another thing is to run -profile without inline to understand the 
structure of time spent per each subroutine better.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Namespace

I noted it to late: decode comes from readText: readText 
validates your text. I will now  use std.file.read.


The new trace output is here and it seems you're right, isKeyword 
and isType (yes I'd decided to separate it again and keep both in 
the lexer) take a lot of time:

http://dpaste.1azy.net/0d6aff6b
 154874  106186   95403   0 pure nothrow bool 
Puzzle.Lexer.isKeyword(immutable(char)[])
 159688   78020   69289   0 pure nothrow bool 
Puzzle.Lexer.isType(immutable(char)[])


I never thought that. :)

Re: Lexer in D

2013-03-02 Thread Namespace

I never used the -profile flag before (only -conv) so I want to 
ask if I understand it right:

http://dpaste.1azy.net/4382097b
It seems that the call which is listed on line 133

1589328  214949  214948   0 pure @trusted 
dchar std.utf.decode!(const(immutable(char)[])).decode(ref 
const(immutable(char)[]), ref uint)


take the most time/performance. Am I right?
And why is something decoded? Should I use dchar instead of char?

And I do not take anything that is already finished, because I 
want to be more familiar with the matter.

I want to see, what you have to note in order to get performance.
The 24 msecs would obviously be a dream, but I would be pleased 
with 50-100 msecs. :D

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 23:50, Zach the Mystic пишет:

On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote:

02-Mar-2013 15:04, David пишет:

Am 02.03.2013 10:33, schrieb Namespace:

For one of our courses this year we have a very cool topic: Lexer and
Compiler.
In the past I have already tried some experiments in this direction
with
Romulus and Remus, but I've never measured my Lexing time.
That's why I wanted to start again to write a lexer (hitherto purely
for
the learning effect).
I looked at the code of some of the other lexers, that are also written
in D, to get some inspiration from, but currently my Lexer is a little
bit slow, it requires for example ~200 msecs for std.datetime.
So I wanted to ask whether there are nice people who help me to improve
our lexer.

Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d

Thanks in advance. :)


Making `types` and `keywords` an array might improve the speed a
little bit:

if (id in types) -> if (types.canFind(id))
(same with keywords)


That would be slower as canFind is linear search.
Simpler way is use first letter and length to select a bucket of
keywords.

But otherwise, yes, the first advice is never use built-in AA as these
take quite some time to allocate. Plus they not that fast to lookup as
they used mod-prime not mod-power-of-2 to pick a bucket.


And it doesn't even need to:
http://d.puremagic.com/issues/show_bug.cgi?id=9522


Yes, obviously AA is becoming a central pain point that needs fixing. If 
you are inclined to help I suggest you join forces with H.S. Teoh and 
Dicebot and help fixing the AA.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


03-Mar-2013 00:03, Namespace пишет:

This is a mess of conditionals is what a direct array of 256 entries
would avoid:

int idx;
if (value[0] != '_') {
idx = value[0] - 'a';
if (idx == 26) return false;
} else {
idx = 26;
}

if (idx < 0 || idx > 26) {
// debug writeln("kword: ", idx, ':', value[0]);
return false;
}

if (keywords[idx] is null) return false;

return keywords[idx].canFind(value);

Gaining some speed in the process. Plus another layer of array to
discern keywords by length. You see why I suggested to generate the
code in the first place ? ;)

BTW what's the reason to separate keywords and type keywords? They are
processed the same in lexer and only parser somewhere up above knows
what to do with these regardless. Just return different token values
for each.


I changed it and merged them together.
Also I use now a array of 256 entities, but I must keep the check if idx
is < 0, because 'D' - 'a' is negative.
And yes I see what you meant.^^



Obviously, you still largely don't :)

Use the full byte to do the lookup dammit. Saving few array slots 
doesn't bring your speed up. Using full byte *directly* as index does 
speed things up because you don't have to do *any* computations and 
*conditionals*.


Conditionals that can easily go both ways are the worst thing 
performance-wise.


I don't know how to phrase that better.


Code: http://dpaste.1azy.net/317241c0
I reach still 215 - 230 msecs.


Use profiling to determine what takes the most of the time. If you don't 
like callgrind/cachegrind (I seriously don't know why not use it) try 
dmd's -profile.


Looking at your main code I see some spaghetti code and passing index by 
pointer. Don't do that - encapsulate lexing state in some kind of 
struct, then the index would be implicitly passed to all functions by 
this pointer.


It may very well be the case you are bound by some other code. In any 
case after some heroic efforts Dscanner lexes std.datetime in 24 msecs, 
and that on linux VM powered by the oldish AMD Phenom II. What kind of 
CPU do you use?


--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Namespace

It is sufficient if my array has only 26 units, because I check 
only lower cases.

Re: Lexer in D

2013-03-02 Thread Namespace

This is a mess of conditionals is what a direct array of 256 
entries would avoid:


int idx;
if (value[0] != '_') {
idx = value[0] - 'a';
if (idx == 26) return false;
} else {
idx = 26;
}

if (idx < 0 || idx > 26) {
// debug writeln("kword: ", idx, ':', value[0]);
return false;
}

if (keywords[idx] is null) return false;

return keywords[idx].canFind(value);

Gaining some speed in the process. Plus another layer of array 
to discern keywords by length. You see why I suggested to 
generate the code in the first place ? ;)


BTW what's the reason to separate keywords and type keywords? 
They are processed the same in lexer and only parser somewhere 
up above knows what to do with these regardless. Just return 
different token values for each.


I changed it and merged them together.
Also I use now a array of 256 entities, but I must keep the check 
if idx is < 0, because 'D' - 'a' is negative.

And yes I see what you meant.^^

Code: http://dpaste.1azy.net/317241c0
I reach still 215 - 230 msecs.

Re: Lexer in D

2013-03-02 Thread Zach the Mystic


On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote:

02-Mar-2013 15:04, David пишет:

Am 02.03.2013 10:33, schrieb Namespace:
For one of our courses this year we have a very cool topic: 
Lexer and

Compiler.
In the past I have already tried some experiments in this 
direction with

Romulus and Remus, but I've never measured my Lexing time.
That's why I wanted to start again to write a lexer (hitherto 
purely for

the learning effect).
I looked at the code of some of the other lexers, that are 
also written
in D, to get some inspiration from, but currently my Lexer is 
a little

bit slow, it requires for example ~200 msecs for std.datetime.
So I wanted to ask whether there are nice people who help me 
to improve

our lexer.

Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d

Thanks in advance. :)


Making `types` and `keywords` an array might improve the speed 
a little bit:


if (id in types) -> if (types.canFind(id))
(same with keywords)


That would be slower as canFind is linear search.
Simpler way is use first letter and length to select a bucket 
of keywords.


But otherwise, yes, the first advice is never use built-in AA 
as these take quite some time to allocate. Plus they not that 
fast to lookup as they used mod-prime not mod-power-of-2 to 
pick a bucket.


And it doesn't even need to:
http://d.puremagic.com/issues/show_bug.cgi?id=9522

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 23:01, Namespace пишет:

I hope I understand you right this time:
http://dpaste.1azy.net/4c2e4428

Was this your idea?
With this I reached between 215 and 230 msecs.


This is a mess of conditionals is what a direct array of 256 entries 
would avoid:


int idx;
if (value[0] != '_') {
idx = value[0] - 'a';
if (idx == 26) return false;
} else {
idx = 26;
}

if (idx < 0 || idx > 26) {
// debug writeln("kword: ", idx, ':', value[0]);
return false;
}

if (keywords[idx] is null) return false;

return keywords[idx].canFind(value);

Gaining some speed in the process. Plus another layer of array to 
discern keywords by length. You see why I suggested to generate the code 
in the first place ? ;)


BTW what's the reason to separate keywords and type keywords? They are 
processed the same in lexer and only parser somewhere up above knows 
what to do with these regardless. Just return different token values for 
each.


--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 22:15, Dmitry Olshansky пишет:

02-Mar-2013 22:09, Namespace пишет:

No-no and no.

Just translate list of keywords/types to a bunch of switch-cases that
will give you near optimal performance. Use CTFE to construct that
string of D code.
In fact I proposed to do just 2 levels of switches: first switch by
length then by first char.


I don't understand.
You don't mean something like:

switch (value) {
case "if": isKeyword = true; break;
...

right?


No.
switch(value.length)
case 2:
 switch(value[0])
 {
 case 'i':
 switch(value[1]){
 case 'f':
 return value.length == 2;


return true;

I edited the post to include length switch before per-char switch.


 case ...
 ...
 }
...
case 3:
...
}





--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 22:09, Namespace пишет:

No-no and no.

Just translate list of keywords/types to a bunch of switch-cases that
will give you near optimal performance. Use CTFE to construct that
string of D code.
In fact I proposed to do just 2 levels of switches: first switch by
length then by first char.


I don't understand.
You don't mean something like:

switch (value) {
case "if": isKeyword = true; break;
...

right?


No.
switch(value.length)
case 2:
switch(value[0])
{
case 'i':
switch(value[1]){
case 'f':
return value.length == 2;
case ...
...
}
...
case 3:
...
}




Even simple if you don't want to go for CTFE code-generation is to
make plain array of array. The length of array is exactly 256 i.e.

Even this should have decent speed:

string[][256] keywords = [
null,
null,
...
//at index 'i'
['int', 'if', 'invariant', 'idouble', 'ifloat' ...]
//array of strings starting with char for each
...
];

bool isKeyword(string value)
{
string[] toSearch = keywords[value[0]];
return toSearch.canFind(value);
}


Ok I will try it. Thank you.



--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Namespace


No-no and no.

Just translate list of keywords/types to a bunch of 
switch-cases that will give you near optimal performance. Use 
CTFE to construct that string of D code.
In fact I proposed to do just 2 levels of switches: first 
switch by length then by first char.


I don't understand.
You don't mean something like:

switch (value) {
case "if": isKeyword = true; break;
...

right?

Even simple if you don't want to go for CTFE code-generation is 
to make plain array of array. The length of array is exactly 
256 i.e.


Even this should have decent speed:

string[][256] keywords = [
null,
null,
...
//at index 'i'
['int', 'if', 'invariant', 'idouble', 'ifloat' ...]
//array of strings starting with char for each
...
];

bool isKeyword(string value)
{
string[] toSearch = keywords[value[0]];
return toSearch.canFind(value);
}


Ok I will try it. Thank you.

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 18:53, Namespace пишет:

That would be slower as canFind is linear search.
Simpler way is use first letter and length to select a bucket of
keywords.


I think I get it.
Did you mean something like this?



No-no and no.

Just translate list of keywords/types to a bunch of switch-cases that 
will give you near optimal performance. Use CTFE to construct that 
string of D code.


In fact I proposed to do just 2 levels of switches: first switch by 
length then by first char.


Even simple if you don't want to go for CTFE code-generation is to make 
plain array of array. The length of array is exactly 256 i.e.


Even this should have decent speed:

string[][256] keywords = [
null,
null,
...
//at index 'i'
['int', 'if', 'invariant', 'idouble', 'ifloat' ...]
//array of strings starting with char for each
...
];

bool isKeyword(string value)
{
string[] toSearch = keywords[value[0]];
return toSearch.canFind(value);
}

Better yet only store [1..$] parts of keywords and search for value[1..$].

Even faster is to make the same for each possible length of keyword.


[code]
immutable ubyte[char] typeMap;
immutable ubyte[char] keywordMap;

static this() {
 typeMap = [
 'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' :
17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26
 ];

 keywordMap = [
 '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' :
30, 'g' : 36, 'i' : 37, 'l' : 45,
 'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't'
: 69, 'u' : 77, 'v' : 79, 'w' : 81
 ];
}

bool isKeyword(string value) pure nothrow {
 if (value[0] !in keywordMap) return false;

 foreach (string kword; keywords[keywordMap[value[0]] .. $]) {
 if (kword == value) return true;
 if (kword[0] != value[0]) return false;
 }

 return false;
}

bool isType(string value) pure nothrow {
 if (value[0] !in typeMap) return false;

 foreach (string type; types[typeMap[value[0]] .. $]) {
 if (type == value) return true;
 if (type[0] != value[0]) return false;
 }

 return false;
}
[/code]

I'm now by ~230 msecs.



--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread Namespace


That would be slower as canFind is linear search.
Simpler way is use first letter and length to select a bucket 
of keywords.


I think I get it.
Did you mean something like this?

[code]
immutable ubyte[char] typeMap;
immutable ubyte[char] keywordMap;

static this() {
typeMap = [
		'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 
17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26

];

keywordMap = [
		'_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 
30, 'g' : 36, 'i' : 37, 'l' : 45,
		'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' 
: 69, 'u' : 77, 'v' : 79, 'w' : 81

];
}

bool isKeyword(string value) pure nothrow {
if (value[0] !in keywordMap) return false;

foreach (string kword; keywords[keywordMap[value[0]] .. $]) {
if (kword == value) return true;
if (kword[0] != value[0]) return false;
}

return false;
}

bool isType(string value) pure nothrow {
if (value[0] !in typeMap) return false;

foreach (string type; types[typeMap[value[0]] .. $]) {
if (type == value) return true;
if (type[0] != value[0]) return false;
}

return false;
}
[/code]

I'm now by ~230 msecs.

Re: Lexer in D

2013-03-02 Thread Namespace

I sorted both, types and keywords, alphabetical and wrote my own 
'canFind':


bool canFind(const size_t size)(ref const string[size] values, 
string value) pure nothrow {

if (value[0] < 'm' || value[0] == '_') {
foreach (string _val; values) {
if (_val == value) return true;
}
} else {
foreach_reverse (string _val; values) {
if (_val == value) return true;
}
}

return false;
}

That brings ~20 msecs, so I have ~280 mesecs instead of ~ 300. 
But it isn't perfect.

Re: Lexer in D

2013-03-02 Thread Namespace


With canFind it takes ~300 msecs. :/
Any idea how I could improve it?

Simpler way is use first letter and length to select a bucket 
of keywords.


I don't understand how you could do this without AA's and how 
this could be more performant than canFind.

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 15:04, David пишет:

Am 02.03.2013 10:33, schrieb Namespace:

For one of our courses this year we have a very cool topic: Lexer and
Compiler.
In the past I have already tried some experiments in this direction with
Romulus and Remus, but I've never measured my Lexing time.
That's why I wanted to start again to write a lexer (hitherto purely for
the learning effect).
I looked at the code of some of the other lexers, that are also written
in D, to get some inspiration from, but currently my Lexer is a little
bit slow, it requires for example ~200 msecs for std.datetime.
So I wanted to ask whether there are nice people who help me to improve
our lexer.

Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d

Thanks in advance. :)


Making `types` and `keywords` an array might improve the speed a little bit:

if (id in types) -> if (types.canFind(id))
(same with keywords)


That would be slower as canFind is linear search.
Simpler way is use first letter and length to select a bucket of keywords.

But otherwise, yes, the first advice is never use built-in AA as these 
take quite some time to allocate. Plus they not that fast to lookup as 
they used mod-prime not mod-power-of-2 to pick a bucket.



--
Dmitry Olshansky

Re: Lexer in D

2013-03-02 Thread David

Am 02.03.2013 10:33, schrieb Namespace:
> For one of our courses this year we have a very cool topic: Lexer and
> Compiler.
> In the past I have already tried some experiments in this direction with
> Romulus and Remus, but I've never measured my Lexing time.
> That's why I wanted to start again to write a lexer (hitherto purely for
> the learning effect).
> I looked at the code of some of the other lexers, that are also written
> in D, to get some inspiration from, but currently my Lexer is a little
> bit slow, it requires for example ~200 msecs for std.datetime.
> So I wanted to ask whether there are nice people who help me to improve
> our lexer.
> 
> Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d
> 
> Thanks in advance. :)

Making `types` and `keywords` an array might improve the speed a little bit:

if (id in types) -> if (types.canFind(id))
(same with keywords)

Re: Lexer in D

2013-03-02 Thread Namespace

I'm already using these flags. But thanks for your Link I will 
take a look at it. :)

Re: Lexer in D

2013-03-02 Thread Dmitry Olshansky


02-Mar-2013 13:33, Namespace пишет:

For one of our courses this year we have a very cool topic: Lexer and
Compiler.
In the past I have already tried some experiments in this direction with
Romulus and Remus, but I've never measured my Lexing time.
That's why I wanted to start again to write a lexer (hitherto purely for
the learning effect).
I looked at the code of some of the other lexers, that are also written
in D, to get some inspiration from, but currently my Lexer is a little
bit slow, it requires for example ~200 msecs for std.datetime.
So I wanted to ask whether there are nice people who help me to improve
our lexer.

Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d

Thanks in advance. :)


See this one I'm currently investing my time into:

https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer

Other then this use cachegrind and compile with ldc/gdc :)

Also for dmd check the flags are "-release -O -inline  -noboundscheck"

--
Dmitry Olshansky

66 matches

Mail list logo