Re: [Ironruby-core] Code Review: parseroptfinal

Curt Hagenlocher Thu, 04 Dec 2008 09:56:45 -0800

Looks good!

-----Original Message-----
From: Tomas Matousek
Sent: Wednesday, December 03, 2008 4:17 PM
To: IronRuby External Code Reviewers
Cc: [email protected]
Subject: Code Review: parseroptfinal


  tfpt review "/shelveset:parseroptfinal;REDMOND\tomat"

Improves Ruby tokenizer and parser performance by 30% total (previously 3 MBps, 
now 4.2 MBps - measured on real code of libraries we have checked in and 
including Ruby AST building, but not transformation to DLR ASTs).

Adds /bm option to tokenizer test driver that measures parsing speed 
(IronRuby.Tests.exe /tokenizer /bm).
Adds more tokenization unit tests and fixes several minor bugs in tokenizer.

AST, Parser.cs, parser.y:
-       Replaces List<Expression> by a new class Statements where the list 
represents statements. This encapsulation us to change underlying 
representation w/o affecting code using the statements. Also refactors the 
rules for statement list so that we can use an empty statements singleton.

-       Replaces List<Expression> by Expression[] where it represents 
arguments. Arguments are the most frequent list of expression so it make sense 
to optimize it. Previously any list of arguments was constructed starting from 
an empty List<Expression> and adding arguments as parser reduced them. This is 
quite inefficient since most method calls have very few arguments (0 to 3). The 
parser now has a stack of Expression to which it pushes arguments as they are 
reduced. Each rule that defines argument list counts the number of arguments in 
the list in token value that is propagated throughout the list reduction. The 
rule that consumes the argument list then simply pops the right amount of 
arguments out of it and copies it to an exactly sized array. This might be 
further optimized if necessary - we can specialize Argument node for 0 to 3 
arguments, or just remember a reference to the arguments stack and index to the 
array.
We can use this approach for other lists as well (maybe for statemens, string 
concatenations, etc.) in future.

-       Provides dummy implementations for features that are not implemented 
yet: defined?(super), BEGIN, END. This avoids throwing not impl. exceptions so 
that we can measure performance of AST transformations over entire library code 
base.

GPPG:
-       Removes Parser <: ShiftReduceParser inheritance. It was meant to enable 
sharing of generic GPPG parser driver and derive specific parsers but since we 
don't share the class it makes no sense. Still GPPG.cs is mostly Ruby 
independent piece of code that could be easily copied by other languages.
-       Merges 3 stacks (state, value and location) into a single data 
structure reducing cost of push and pop.
-       Improves reduction of empty rules, removes unnecessary array lookups on 
hot path (these changes require some adjustments in GPPG generator).

TokenValue:
-       Reduces size of the structure by using an explicit layout definition. 
This reduces amount of data moved around during reductions.
-       It might be possible in future to get rid of the obj2 field (it's used 
only by argument rules).

Tokenizer:
-       Removes a direct dependency on the parser - we only need a resolver for 
local variables. Introduces ILexicalVariableResolver (implemented by Parser) 
that provides such information. This allows tests to run tokenizer with mocked 
parser.
-       Replaces nextc(), peekc() and pushback() methods by Read, Peek, Back, 
SeekRelative, which don't implicitly normalize end of lines. This allows us to 
drop yytext StringBuilder used to track normalized eolns. It was the culprit of 
very bad performance. Basically entire tokenized source code got eventually 
loaded into this StringBuilder one character in time :-/. The tokenizer now 
needs to deal with eoln normalization explicitly, which required a lot of small 
tweaks.
-       Reimplements line buffer loading - previous implementation was 
allocating 2 char[] per line, which is of course terrible waste. The buffer is 
now shared for all lines of code except for heredoc that needs to be handled in 
a special way.
-       The tokenizer is now finally in a good shape, although there are more 
small improvements that could be done.

Tomas
_______________________________________________
Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core

Re: [Ironruby-core] Code Review: parseroptfinal

Reply via email to