Re: Lexer in D
On Sunday, 24 March 2013 at 22:16:45 UTC, Brian Schott wrote: On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote: Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also incredibly complex. I will also stop very soon to work on it, but maybe the code / the idea inspire someone else to write something more mature for the D community. This is also the reason why I'm asking in this fizzle thread for a feedback. :) But once you've taken a look on it, I would be happy if you could tell me what should be improved next time. I have similar goals for the Dscananer tool, so I may steal some of your ideas. I'm glad to hear that. If I can help let me know, would be happy to gain more knowledge in this area. And I would be glad if some of my ideas become reality.
Re: Lexer in D
On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote: Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also incredibly complex. I will also stop very soon to work on it, but maybe the code / the idea inspire someone else to write something more mature for the D community. This is also the reason why I'm asking in this fizzle thread for a feedback. :) But once you've taken a look on it, I would be happy if you could tell me what should be improved next time. I have similar goals for the Dscananer tool, so I may steal some of your ideas.
Re: Lexer in D
Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also incredibly complex. I will also stop very soon to work on it, but maybe the code / the idea inspire someone else to write something more mature for the D community. This is also the reason why I'm asking in this fizzle thread for a feedback. :) But once you've taken a look on it, I would be happy if you could tell me what should be improved next time.
Re: Lexer in D
24-Mar-2013 20:08, Namespace пишет: On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote: On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote: On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code. I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D Moved to https://github.com/Dgame/DAT/ It's time for a beta. Would be nice if someone takes a look at DAT or try it. I'm sure that there are still some bugs, but maybe this little recreational fun is indeed useful for one or the other. I had a lot of fun. It's tested with 'simple.d', std.stdio and std.datetime. Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Reusing the old and fizzled out thread of a D lexer to discuss a tool created on top of it is bound to have very little feedback. -- Dmitry Olshansky
Re: Lexer in D
On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote: On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote: On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code. I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D Moved to https://github.com/Dgame/DAT/ It's time for a beta. Would be nice if someone takes a look at DAT or try it. I'm sure that there are still some bugs, but maybe this little recreational fun is indeed useful for one or the other. I had a lot of fun. It's tested with 'simple.d', std.stdio and std.datetime.
Re: Lexer in D
On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote: On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code. I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D Moved to https://github.com/Dgame/DAT/
Re: Lexer in D
On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote: So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code. I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D
Re: Lexer in D
So you got rid of array creation. About time ;) Yes, that was the only way to get closer to your measured time. :D
Re: Lexer in D
So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.
Re: Lexer in D
15-Mar-2013 01:17, Artur Skawina пишет: On 03/14/13 18:24, Dmitry Olshansky wrote: 14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :D And my is 15 with Dscanner. But it's stuck at around 21-22. But both of us can just cheat and buy a newer PC :) What are you both actually measuring? IOW what happens with the resulting tokens? artur I'm counting and/or filtering out specific tokens. The latter is a tiny bit slower. The other options of Dscanner are to generate e.g. HTML highlighting for code or measure token frequencies. Haven't timed those as their speed doesn't entirely depend on lexer as is. -- Dmitry Olshansky
Re: Lexer in D
On 03/14/13 18:24, Dmitry Olshansky wrote: > 14-Mar-2013 21:20, Namespace пишет: >> Yes, and it's really fun. :) >> By the way, the code can be found here: >> https://github.com/Dgame/Puzzle/blob/master/DLexer.d >> Maybe some of you have an idea how I could improve anything. >> Last trace.log is also there: >> https://github.com/Dgame/Puzzle/blob/master/trace.log >> My goal is about 20 msecs. :D > > And my is 15 with Dscanner. But it's stuck at around 21-22. > But both of us can just cheat and buy a newer PC :) What are you both actually measuring? IOW what happens with the resulting tokens? artur
Re: Lexer in D
14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :D So you got rid of array creation. About time ;) -- Dmitry Olshansky
Re: Lexer in D
Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :D
Re: Lexer in D
14-Mar-2013 21:20, Namespace пишет: Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :D And my is 15 with Dscanner. But it's stuck at around 21-22. But both of us can just cheat and buy a newer PC :) So what we need to count is the number of syscalls + cycles spent in user mode to have a meaningful metric. -- Dmitry Olshansky
Re: Lexer in D
14-Mar-2013 01:19, Namespace пишет: Array has an horrible code... I decided to start from scratch. So I took a close look at the lexer of dmd and took over the basic functionality. Result for std.datetime (without profiling so far): It's getting better! Total time (ms): 5971.66 Repetitions: 200 Sample mode: 28 (101 ocurrences) Median time: 28.89 Avg time : 29.8583 Std dev. : 3.37222 Minimum: 27.958 Maximum: 70.583 95% conf.int. : [23.2489, 36.4677] e = 6.60943 99% conf.int. : [21.172, 38.5446] e = 8.68627 EstimatedAvg95%: [29.3909, 30.3257] e = 0.467357 EstimatedAvg99%: [29.2441, 30.4725] e = 0.614212 -- Dmitry Olshansky
Re: Lexer in D
Array has an horrible code... I decided to start from scratch. So I took a close look at the lexer of dmd and took over the basic functionality. Result for std.datetime (without profiling so far): Total time (ms): 5971.66 Repetitions: 200 Sample mode: 28 (101 ocurrences) Median time: 28.89 Avg time : 29.8583 Std dev. : 3.37222 Minimum: 27.958 Maximum: 70.583 95% conf.int. : [23.2489, 36.4677] e = 6.60943 99% conf.int. : [21.172, 38.5446] e = 8.68627 EstimatedAvg95%: [29.3909, 30.3257] e = 0.467357 EstimatedAvg99%: [29.2441, 30.4725] e = 0.614212
Re: Lexer in D
Profile? Also, note that appender is a tool for appending stuff *into an array*. If you don't actually need an array at the end, then you could give Array a roll? I don't think it will necessarily do better, but it doesn't hurt to give it a roll. My profile output looks like this: http://dpaste.1azy.net/0409f0bc Since I have already discussed most of the stuff in this thread, I do not know what I could still improve on the Lexer. But I have not yet thought of Array. Because my Appender use already malloc and free, I could learn something by the implementation of 'Array'. Thanks for the suggestion.
Re: Lexer in D
On Monday, 11 March 2013 at 16:37:00 UTC, Namespace wrote: Seems to be the wrong point of optimizations. Profile? Also, note that appender is a tool for appending stuff *into an array*. If you don't actually need an array at the end, then you could give Array a roll? I don't think it will necessarily do better, but it doesn't hurt to give it a roll.
Re: Lexer in D
extends if possible. http://dpaste.dzfl.pl/f86e5db6 import core.stdc.stdlib; void main() { auto p1 = malloc(14); auto p2 = realloc(p1, 15); assert(p1 is p2); } Cool, nice to know. Currently I have this benchmark: Without memory reserve: Total time (ms): 6779.16 Repetitions: 200 Sample mode: 33 (73 ocurrences) Median time: 33.689 Avg time : 33.8958 Std dev. : 1.86754 Minimum: 31.414 Maximum: 52.237 95% conf.int. : [30.2355, 37.5561] e = 3.66031 99% conf.int. : [29.0853, 38.7063] e = 4.81047 EstimatedAvg95%: [33.637, 34.1546] e = 0.258823 EstimatedAvg99%: [33.5556, 34.2359] e = 0.340151 With memory reserve: Total time (ms): 5945.1 Repetitions: 200 Sample mode: 29 (68 ocurrences) Median time: 29.3215 Avg time : 29.7255 Std dev. : 2.083 Minimum: 27.458 Maximum: 45.32 95% conf.int. : [25.6429, 33.8081] e = 4.08261 99% conf.int. : [24.36, 35.0909] e = 5.36546 EstimatedAvg95%: [29.4368, 30.0142] e = 0.288684 EstimatedAvg99%: [29.3461, 30.1049] e = 0.379396 But I does not gain that much: Total time (ms): 10833 Repetitions: 200 Sample mode: 53 (105 ocurrences) Median time: 53.848 Avg time : 54.1652 Std dev. : 1.72681 Minimum: 52.118 Maximum: 69.703 95% conf.int. : [50.7807, 57.5497] e = 3.38448 99% conf.int. : [49.7172, 58.6131] e = 4.44796 EstimatedAvg95%: [53.9259, 54.4045] e = 0.239319 EstimatedAvg99%: [53.8507, 54.4797] e = 0.314518 Seems to be the wrong point of optimizations.
Re: Lexer in D
On Monday, 11 March 2013 at 15:56:37 UTC, Namespace wrote: On Monday, 11 March 2013 at 15:51:58 UTC, David wrote: But I do not think you can extend a memory block with C functions. What about realloc? That's what I use. But it moves your old memory into the new allocated. Or proves it first if your current memory block could be extended? extends if possible. http://dpaste.dzfl.pl/f86e5db6 import core.stdc.stdlib; void main() { auto p1 = malloc(14); auto p2 = realloc(p1, 15); assert(p1 is p2); }
Re: Lexer in D
On Monday, 11 March 2013 at 15:51:58 UTC, David wrote: But I do not think you can extend a memory block with C functions. What about realloc? That's what I use. But it moves your old memory into the new allocated. Or proves it first if your current memory block could be extended?
Re: Lexer in D
> But I do not think you can extend a memory block with C functions. What about realloc?
Re: Lexer in D
On Monday, 11 March 2013 at 15:21:31 UTC, FG wrote: On 2013-03-11 16:11, Namespace wrote: I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation. Probably just because D's Appender has a better growth scheme than yours and you reallocate too often. Try to compare how often that happens. D's: 24 extends, 9 moves (qalloc + memcpy). My Appender: 25 moves. But I do not think you can extend a memory block with C functions. I'm also very interested to hear some improvements tips. Well, you may add a check whether alloc/realloc actually succeeded. :) Maybe. :)
Re: Lexer in D
On 2013-03-11 16:11, Namespace wrote: I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation. Probably just because D's Appender has a better growth scheme than yours and you reallocate too often. Try to compare how often that happens. I'm also very interested to hear some improvements tips. Well, you may add a check whether alloc/realloc actually succeeded. :)
Re: Lexer in D
Total time (ms): 10941.9 Repetitions: 200 Sample mode: 54 (118 ocurrences) Median time: 54.5855 Avg time : 54.7094 Std dev. : 1.33935 Minimum: 52.535 Maximum: 67.206 95% conf.int. : [52.0843, 57.3345] e = 2.62509 99% conf.int. : [51.2594, 58.1593] e = 3.44995 EstimatedAvg95%: [54.5238, 54.895] e = 0.185622 EstimatedAvg99%: [54.4654, 54.9533] e = 0.243948 And even a bit closer. I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation. My Appender: http://dpaste.1azy.net/1bb34d9a I'm also very interested to hear some improvements tips. In each benchmark 1_000_000 times these struct is appended: struct B { public: const int b; } Without memory reserve: My Appender: Total time (ms): 7638.67 Repetitions: 200 Sample mode: 38 (68 ocurrences) Median time: 38.0545 Avg time : 38.1934 Std dev. : 1.56827 Minimum: 35.859 Maximum: 47.586 95% conf.int. : [35.1196, 41.2671] e = 3.07375 99% conf.int. : [34.1538, 42.233] e = 4.0396 EstimatedAvg95%: [37.976, 38.4107] e = 0.217347 EstimatedAvg99%: [37.9077, 38.479] e = 0.285643 D's Appender: Total time (ms): 6972.66 Repetitions: 200 Sample mode: 34 (97 ocurrences) Median time: 34.7795 Avg time : 34.8633 Std dev. : 1.41623 Minimum: 32.96 Maximum: 49.579 95% conf.int. : [32.0876, 37.6391] e = 2.77576 99% conf.int. : [31.2153, 38.5113] e = 3.64797 EstimatedAvg95%: [34.667, 35.0596] e = 0.196276 EstimatedAvg99%: [34.6054, 35.1213] e = 0.257951 D array: Total time (ms): 22868.2 Repetitions: 200 Sample mode: 110 (198 ocurrences) Median time: 113.658 Avg time : 114.341 Std dev. : 1.97184 Minimum: 113.011 Maximum: 133.978 95% conf.int. : [110.477, 118.206] e = 3.86473 99% conf.int. : [109.262, 119.42] e = 5.07911 EstimatedAvg95%: [114.068, 114.615] e = 0.273277 EstimatedAvg99%: [113.982, 114.7] e = 0.359147 With memory reserve: My Appender: Total time (ms): 6197.48 Repetitions: 200 Sample mode: 30 (88 ocurrences) Median time: 30.617 Avg time : 30.9874 Std dev. : 2.86913 Minimum: 28.583 Maximum: 64.06 95% conf.int. : [25.364, 36.6108] e = 5.62339 99% conf.int. : [23.597, 38.3778] e = 7.39039 EstimatedAvg95%: [30.5897, 31.385] e = 0.397634 EstimatedAvg99%: [30.4648, 31.51] e = 0.52258 D's Appender: Total time (ms): 6628.31 Repetitions: 200 Sample mode: 32 (104 ocurrences) Median time: 32.6775 Avg time : 33.1415 Std dev. : 3.00344 Minimum: 31.506 Maximum: 57.493 95% conf.int. : [27.2549, 39.0282] e = 5.88663 99% conf.int. : [25.4052, 40.8779] e = 7.73635 EstimatedAvg95%: [32.7253, 33.5578] e = 0.416248 EstimatedAvg99%: [32.5945, 33.6886] e = 0.547042 D array: Total time (ms): 22849.9 Repetitions: 200 Sample mode: 110 (198 ocurrences) Median time: 113.025 Avg time : 114.249 Std dev. : 8.40906 Minimum: 112.355 Maximum: 222.982 95% conf.int. : [97.768, 130.731] e = 16.4815 99% conf.int. : [92.5891, 135.91] e = 21.6603 EstimatedAvg95%: [113.084, 115.415] e = 1.16542 EstimatedAvg99%: [112.718, 115.781] e = 1.53162
Re: Lexer in D
Total time (ms): 11374.4 Repetitions: 200 Sample mode: 56 (89 ocurrences) Median time: 56.312 Avg time : 56.872 Std dev. : 2.60698 Minimum: 53.978 Maximum: 81.602 95% conf.int. : [51.7624, 61.9816] e = 5.10959 99% conf.int. : [50.1569, 63.5871] e = 6.71514 EstimatedAvg95%: [56.5107, 57.2333] e = 0.361303 EstimatedAvg99%: [56.3972, 57.3468] e = 0.474832 I'm getting closer.
Re: Lexer in D
05-Mar-2013 01:35, Brian Schott пишет: On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote: Brain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime. https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh I checked in the script that I use to generate the data for the graph last night. You can copy/paste its output into a spreadsheet (using tabs as field separators). Great, thanks! I'll see if I can make a plot from it with std-dev shown as well. -- Dmitry Olshansky
Re: Lexer in D
On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote: Brain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime. https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh I checked in the script that I use to generate the data for the graph last night. You can copy/paste its output into a spreadsheet (using tabs as field separators).
Re: Lexer in D
04-Mar-2013 23:19, Namespace пишет: Yes, but I'm not as familiar with ranges. But I want to learn more about them and try it then. What do you mean, how much would entail the use of ranges compared to my current method? 30%? Basically you copy Token structs and sometimes reallocate the whole array (I guess at 60% you don't reallocate but waste time on GC.malloc that zeroes out memory block). My estimate is no less then 10-15%. -- Dmitry Olshansky
Re: Lexer in D
Yes, but I'm not as familiar with ranges. But I want to learn more about them and try it then. What do you mean, how much would entail the use of ranges compared to my current method? 30%?
Re: Lexer in D
04-Mar-2013 22:58, Namespace пишет: Yes, I know. But I do not think I have a chance to get even closer to it. At least I don't know, how. One thing that all of these lexers do (dmd's orignal and Dscanner) is get tokens as you need them - lazily. You would always be slower if taking time to append them in some array (no matter how). The idiomatic way as I said is to use ranges. I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+. Mm that's not that far away from mine. But Brain has got something even more powerful :) You can easily download and test Dscanner on the your CPU so that you get numbers that are actually comparable. -- Dmitry Olshansky
Re: Lexer in D
Yes, I know. But I do not think I have a chance to get even closer to it. At least I don't know, how. I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+.
Re: Lexer in D
04-Mar-2013 16:17, Namespace пишет: With reserving of 60% of the file size in memory I get a big speed boost. Total time (ms): 17544.6 Repetitions: 200 Sample mode: 87 (121 ocurrences) Median time: 87.3835 Avg time : 87.7232 Std dev. : 1.77905 Minimum: 84.238 Maximum: 101.919 95% conf.int. : [84.2363, 91.21] e = 3.48687 99% conf.int. : [83.1407, 92.3057] e = 4.58252 EstimatedAvg95%: [87.4766, 87.9697] e = 0.246559 EstimatedAvg99%: [87.3991, 88.0472] e = 0.324033 But I'm still away from: Avg time : 69.3088 http://forum.dlang.org/thread/dpdgcycrgfspcxenz...@forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org Keep in mind the CPU you are testing on. What's you chip? In any case Dscanner has been improved to take far far less time then 60ms. Brain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime. I've tried also 10%, 40% and 80% but 60% seems optimal. And I think I reached my performance limit. I could implement the LazyForwardRange and, if anyone can confirm the statement of char <-> dchar decoding and that the usage of ubyte would be (crucial) better, I will try this too. -- Dmitry Olshansky
Re: Lexer in D
With reserving of 60% of the file size in memory I get a big speed boost. Total time (ms): 17544.6 Repetitions: 200 Sample mode: 87 (121 ocurrences) Median time: 87.3835 Avg time : 87.7232 Std dev. : 1.77905 Minimum: 84.238 Maximum: 101.919 95% conf.int. : [84.2363, 91.21] e = 3.48687 99% conf.int. : [83.1407, 92.3057] e = 4.58252 EstimatedAvg95%: [87.4766, 87.9697] e = 0.246559 EstimatedAvg99%: [87.3991, 88.0472] e = 0.324033 But I'm still away from: Avg time : 69.3088 http://forum.dlang.org/thread/dpdgcycrgfspcxenz...@forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org I've tried also 10%, 40% and 80% but 60% seems optimal. And I think I reached my performance limit. I could implement the LazyForwardRange and, if anyone can confirm the statement of char <-> dchar decoding and that the usage of ubyte would be (crucial) better, I will try this too.
Re: Lexer in D
I've installed cygwin to use the time benchmark function and that are the results: $ time ./Lexer.exe real0m0.225s user0m0.015s sys 0m0.046s I also read in the std.d.lexer thread that the use of 'char' isn't that good for a lexer, because dmd wants to decode 'char' to 'dchar'. The usage of ubyte should be better. Is that right?
Re: Lexer in D
It's quite amazing, with Appender I reach between 115 and 125 msecs. I adapted the Appender for myself and solve the problem with memcpy/memmove. That works fine. But is there any other I can do (besides the LazyForwardRange) to gain more performance?
Re: Lexer in D
On Sunday, March 03, 2013 17:19:25 Namespace wrote: > But hasn't each range an array internally? Many do, but some don't. Take std.algorithm.ioto, or the ranges in std.random for instance. They're generative. And if all ranges had arrays underneat the hood, infinite ranges wouldn't be possible except for something like std.algorithm.cycle. Something like struct Range { @property int front() { return _i; } void popFront() {++i} @property bool empty { return _i != int.max; } private int _i; } would be a valid range. In the case of a lexer, you generate the next token on each popFront call, so you consume the range of characters that you're lexing as you go but don't need to allocate memory for more than the current token at a time. - Jonathan M Davis
Re: Lexer in D
03-Mar-2013 20:45, Namespace пишет: Just rip off the const why would you need it anyway? The data in 'tokens' are final, so why they should not be const? Then just return const(Token) like simple full const. In general I'd avoid marking data as bitwise const unless that is actually true. For all you know some algorithm up above might tweak existing token values directly. What exactly is the problem if there are const member? Is this a known problem? It's quite old and ugly as a lot of things would need to bit-blit stuff over it and fail to do so. Otherwise, I will go and investigate it. -- Dmitry Olshansky
Re: Lexer in D
Just rip off the const why would you need it anyway? The data in 'tokens' are final, so why they should not be const? What exactly is the problem if there are const member? Is this a known problem? Otherwise, I will go and investigate it.
Re: Lexer in D
03-Mar-2013 20:36, Namespace пишет: On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote: Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward range sounds interesting. Appender does not work for my Token struct because I have const/immutable members. So I cannot add something. Example: http://dpaste.1azy.net/21f100d3 Without 'const' it works fine. Just rip off the const why would you need it anyway? -- Dmitry Olshansky
Re: Lexer in D
On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote: Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward range sounds interesting. Appender does not work for my Token struct because I have const/immutable members. So I cannot add something. Example: http://dpaste.1azy.net/21f100d3 Without 'const' it works fine.
Re: Lexer in D
03-Mar-2013 20:19, Namespace пишет: Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Of course, not or the whole range concept is meaningless. Or how does that work? By having a struct or generally any object that in fact works as generator of tokens. popFront lexes new token, front gives the current token and empty indicate if there are no more tokens. See std.range for some simple ranges. If you throw in a .save method you can replay tokenization from some point in the past (= soem older state, think lookahead when parsing). I try to use Appender, but this lazy forward range sounds interesting. -- Dmitry Olshansky
Re: Lexer in D
Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward range sounds interesting.
Re: Lexer in D
03-Mar-2013 18:28, Namespace пишет: I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer). Yes I've understood, but currently I want it so. Another thing is to run -profile without inline to understand the structure of time spent per each subroutine better. I did this and I refactored a lot of my code. Now I get 170 - 180 msecs and this is my current trace.log: http://dpaste.1azy.net/b94b19ff But currently I have no more ideas how I could gain more performance. Maybe I should disable the compiler while looping through the text? Or maybe I should allocate more space for the Token array (e.g. toks.length = 100;)? I hope that you or anyone other have further ideas. But anyway, thanks for the help! :) Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. -- Dmitry Olshansky
Re: Lexer in D
I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer). Yes I've understood, but currently I want it so. Another thing is to run -profile without inline to understand the structure of time spent per each subroutine better. I did this and I refactored a lot of my code. Now I get 170 - 180 msecs and this is my current trace.log: http://dpaste.1azy.net/b94b19ff But currently I have no more ideas how I could gain more performance. Maybe I should disable the compiler while looping through the text? Or maybe I should allocate more space for the Token array (e.g. toks.length = 100;)? I hope that you or anyone other have further ideas. But anyway, thanks for the help! :)
Re: Lexer in D
03-Mar-2013 03:12, Namespace пишет: I noted it to late: decode comes from readText: readText validates your text. I will now use std.file.read. That's why Walter promotes profilers. They help verify hypothesis on performance and constantly surprise people in what they actually find. The new trace output is here and it seems you're right, isKeyword and isType (yes I'd decided to separate it again and keep both in the lexer) take a lot of time: http://dpaste.1azy.net/0d6aff6b 154874 106186 95403 0 pure nothrow bool Puzzle.Lexer.isKeyword(immutable(char)[]) 159688 78020 69289 0 pure nothrow bool Puzzle.Lexer.isType(immutable(char)[]) I never thought that. :) -- Dmitry Olshansky
Re: Lexer in D
03-Mar-2013 03:52, Namespace пишет: I think that thanks to your suggestions isKeyword and isType are entirely inlined, because they aren't listet in the trace.log anymore (http://dpaste.1azy.net/b94b19ff). The only thing which makes trouble is the isNext function. Maybe I should also use a StringStream (implemented as Range). What do you think? I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer). Another thing is to run -profile without inline to understand the structure of time spent per each subroutine better. -- Dmitry Olshansky
Re: Lexer in D
I noted it to late: decode comes from readText: readText validates your text. I will now use std.file.read. The new trace output is here and it seems you're right, isKeyword and isType (yes I'd decided to separate it again and keep both in the lexer) take a lot of time: http://dpaste.1azy.net/0d6aff6b 154874 106186 95403 0 pure nothrow bool Puzzle.Lexer.isKeyword(immutable(char)[]) 159688 78020 69289 0 pure nothrow bool Puzzle.Lexer.isType(immutable(char)[]) I never thought that. :)
Re: Lexer in D
I never used the -profile flag before (only -conv) so I want to ask if I understand it right: http://dpaste.1azy.net/4382097b It seems that the call which is listed on line 133 1589328 214949 214948 0 pure @trusted dchar std.utf.decode!(const(immutable(char)[])).decode(ref const(immutable(char)[]), ref uint) take the most time/performance. Am I right? And why is something decoded? Should I use dchar instead of char? And I do not take anything that is already finished, because I want to be more familiar with the matter. I want to see, what you have to note in order to get performance. The 24 msecs would obviously be a dream, but I would be pleased with 50-100 msecs. :D
Re: Lexer in D
02-Mar-2013 23:50, Zach the Mystic пишет: On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote: 02-Mar-2013 15:04, David пишет: Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :) Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords) That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket. And it doesn't even need to: http://d.puremagic.com/issues/show_bug.cgi?id=9522 Yes, obviously AA is becoming a central pain point that needs fixing. If you are inclined to help I suggest you join forces with H.S. Teoh and Dicebot and help fixing the AA. -- Dmitry Olshansky
Re: Lexer in D
03-Mar-2013 00:03, Namespace пишет: This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx < 0 || idx > 26) { // debug writeln("kword: ", idx, ':', value[0]); return false; } if (keywords[idx] is null) return false; return keywords[idx].canFind(value); Gaining some speed in the process. Plus another layer of array to discern keywords by length. You see why I suggested to generate the code in the first place ? ;) BTW what's the reason to separate keywords and type keywords? They are processed the same in lexer and only parser somewhere up above knows what to do with these regardless. Just return different token values for each. I changed it and merged them together. Also I use now a array of 256 entities, but I must keep the check if idx is < 0, because 'D' - 'a' is negative. And yes I see what you meant.^^ Obviously, you still largely don't :) Use the full byte to do the lookup dammit. Saving few array slots doesn't bring your speed up. Using full byte *directly* as index does speed things up because you don't have to do *any* computations and *conditionals*. Conditionals that can easily go both ways are the worst thing performance-wise. I don't know how to phrase that better. Code: http://dpaste.1azy.net/317241c0 I reach still 215 - 230 msecs. Use profiling to determine what takes the most of the time. If you don't like callgrind/cachegrind (I seriously don't know why not use it) try dmd's -profile. Looking at your main code I see some spaghetti code and passing index by pointer. Don't do that - encapsulate lexing state in some kind of struct, then the index would be implicitly passed to all functions by this pointer. It may very well be the case you are bound by some other code. In any case after some heroic efforts Dscanner lexes std.datetime in 24 msecs, and that on linux VM powered by the oldish AMD Phenom II. What kind of CPU do you use? -- Dmitry Olshansky
Re: Lexer in D
It is sufficient if my array has only 26 units, because I check only lower cases.
Re: Lexer in D
This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx < 0 || idx > 26) { // debug writeln("kword: ", idx, ':', value[0]); return false; } if (keywords[idx] is null) return false; return keywords[idx].canFind(value); Gaining some speed in the process. Plus another layer of array to discern keywords by length. You see why I suggested to generate the code in the first place ? ;) BTW what's the reason to separate keywords and type keywords? They are processed the same in lexer and only parser somewhere up above knows what to do with these regardless. Just return different token values for each. I changed it and merged them together. Also I use now a array of 256 entities, but I must keep the check if idx is < 0, because 'D' - 'a' is negative. And yes I see what you meant.^^ Code: http://dpaste.1azy.net/317241c0 I reach still 215 - 230 msecs.
Re: Lexer in D
On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote: 02-Mar-2013 15:04, David пишет: Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :) Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords) That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket. And it doesn't even need to: http://d.puremagic.com/issues/show_bug.cgi?id=9522
Re: Lexer in D
02-Mar-2013 23:01, Namespace пишет: I hope I understand you right this time: http://dpaste.1azy.net/4c2e4428 Was this your idea? With this I reached between 215 and 230 msecs. This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx < 0 || idx > 26) { // debug writeln("kword: ", idx, ':', value[0]); return false; } if (keywords[idx] is null) return false; return keywords[idx].canFind(value); Gaining some speed in the process. Plus another layer of array to discern keywords by length. You see why I suggested to generate the code in the first place ? ;) BTW what's the reason to separate keywords and type keywords? They are processed the same in lexer and only parser somewhere up above knows what to do with these regardless. Just return different token values for each. -- Dmitry Olshansky
Re: Lexer in D
02-Mar-2013 22:15, Dmitry Olshansky пишет: 02-Mar-2013 22:09, Namespace пишет: No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right? No. switch(value.length) case 2: switch(value[0]) { case 'i': switch(value[1]){ case 'f': return value.length == 2; return true; I edited the post to include length switch before per-char switch. case ... ... } ... case 3: ... } -- Dmitry Olshansky
Re: Lexer in D
02-Mar-2013 22:09, Namespace пишет: No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right? No. switch(value.length) case 2: switch(value[0]) { case 'i': switch(value[1]){ case 'f': return value.length == 2; case ... ... } ... case 3: ... } Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); } Ok I will try it. Thank you. -- Dmitry Olshansky
Re: Lexer in D
No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right? Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); } Ok I will try it. Thank you.
Re: Lexer in D
02-Mar-2013 18:53, Namespace пишет: That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. I think I get it. Did you mean something like this? No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); } Better yet only store [1..$] parts of keywords and search for value[1..$]. Even faster is to make the same for each possible length of keyword. [code] immutable ubyte[char] typeMap; immutable ubyte[char] keywordMap; static this() { typeMap = [ 'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26 ]; keywordMap = [ '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 30, 'g' : 36, 'i' : 37, 'l' : 45, 'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' : 69, 'u' : 77, 'v' : 79, 'w' : 81 ]; } bool isKeyword(string value) pure nothrow { if (value[0] !in keywordMap) return false; foreach (string kword; keywords[keywordMap[value[0]] .. $]) { if (kword == value) return true; if (kword[0] != value[0]) return false; } return false; } bool isType(string value) pure nothrow { if (value[0] !in typeMap) return false; foreach (string type; types[typeMap[value[0]] .. $]) { if (type == value) return true; if (type[0] != value[0]) return false; } return false; } [/code] I'm now by ~230 msecs. -- Dmitry Olshansky
Re: Lexer in D
That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. I think I get it. Did you mean something like this? [code] immutable ubyte[char] typeMap; immutable ubyte[char] keywordMap; static this() { typeMap = [ 'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26 ]; keywordMap = [ '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 30, 'g' : 36, 'i' : 37, 'l' : 45, 'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' : 69, 'u' : 77, 'v' : 79, 'w' : 81 ]; } bool isKeyword(string value) pure nothrow { if (value[0] !in keywordMap) return false; foreach (string kword; keywords[keywordMap[value[0]] .. $]) { if (kword == value) return true; if (kword[0] != value[0]) return false; } return false; } bool isType(string value) pure nothrow { if (value[0] !in typeMap) return false; foreach (string type; types[typeMap[value[0]] .. $]) { if (type == value) return true; if (type[0] != value[0]) return false; } return false; } [/code] I'm now by ~230 msecs.
Re: Lexer in D
I sorted both, types and keywords, alphabetical and wrote my own 'canFind': bool canFind(const size_t size)(ref const string[size] values, string value) pure nothrow { if (value[0] < 'm' || value[0] == '_') { foreach (string _val; values) { if (_val == value) return true; } } else { foreach_reverse (string _val; values) { if (_val == value) return true; } } return false; } That brings ~20 msecs, so I have ~280 mesecs instead of ~ 300. But it isn't perfect.
Re: Lexer in D
With canFind it takes ~300 msecs. :/ Any idea how I could improve it? Simpler way is use first letter and length to select a bucket of keywords. I don't understand how you could do this without AA's and how this could be more performant than canFind.
Re: Lexer in D
02-Mar-2013 15:04, David пишет: Am 02.03.2013 10:33, schrieb Namespace: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :) Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords) That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket. -- Dmitry Olshansky
Re: Lexer in D
Am 02.03.2013 10:33, schrieb Namespace: > For one of our courses this year we have a very cool topic: Lexer and > Compiler. > In the past I have already tried some experiments in this direction with > Romulus and Remus, but I've never measured my Lexing time. > That's why I wanted to start again to write a lexer (hitherto purely for > the learning effect). > I looked at the code of some of the other lexers, that are also written > in D, to get some inspiration from, but currently my Lexer is a little > bit slow, it requires for example ~200 msecs for std.datetime. > So I wanted to ask whether there are nice people who help me to improve > our lexer. > > Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d > > Thanks in advance. :) Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords)
Re: Lexer in D
I'm already using these flags. But thanks for your Link I will take a look at it. :)
Re: Lexer in D
02-Mar-2013 13:33, Namespace пишет: For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :) See this one I'm currently investing my time into: https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer Other then this use cachegrind and compile with ldc/gdc :) Also for dmd check the flags are "-release -O -inline -noboundscheck" -- Dmitry Olshansky