Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Levente Uzonyi Wed, 25 Apr 2012 14:45:33 -0700

On Wed, 25 Apr 2012, Eliot Miranda wrote:



On Wed, Apr 25, 2012 at 1:11 PM, Levente Uzonyi <le...@elte.hu> wrote:
      On Fri, 20 Apr 2012, Frank Shearar wrote:

            On 20 April 2012 03:51, Levente Uzonyi <le...@elte.hu> wrote:
                  On Thu, 19 Apr 2012, Frank Shearar wrote:

                        I found a serious bug in parsing numbers with negative 
exponents. It
                        was completey broken, in fact, parsing 1e-1 as 10, not 
1 / 10. Anyway.
                        This version fixes that, and adds a bunch of tests 
demonstrating that
                        number parsing will return rationals if it can.

                        It's significantly slower than Squeak's SqNumberParser:

                        Time millisecondsToRun: [100000 timesRepeat: 
[SqNumberParser parse:
                        '1234567890']] => 466

                        Time millisecondsToRun: [100000 timesRepeat: 
[PPSmalltalkNumberParser
parse: '1234567890']] => 32082

I've attached a MessageTally spying on the latter: I've not much skill
in reading these, but nothing leaps out at me as being obviously
awful.



Didn't check the code, just the tally, and I think that
PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>digitsBase: is begging
for optimization. It's probably also the cause of the high amount of garbage
which causes significant amount of time spent with garbage collection.
It's also interesting is that the finalization process does so much work,
there may be something wrong with your image.


Thanks for taking a look, Levente.

I'd expect digitsBase: to dominate the running costs, given that we're
parsing numbers.


I finally checked the code and there's plenty of space for optimization. Note 
that the code can't be loaded into Squeak, because there's an invalid
symbol #__gen__binding, and some methods with nil category.


      I do make a large number of throwaway "immutable" values with a
      Builder-like pattern... in PPSmalltalkNumberParser >>
      #makeNumberFrom:base:. That, I would imagine, could explain the
      garbage?


The current garbage collector is not optimal for large images and large amount 
of garbage, so you should try avoid creating it in performance
critial parts of your code.


      If I may, what do you look for when reading the MessageTally? How do
      you tell, for instance, that there's excessive garbage production?


In your tally GC time was 20% of total time and another 20% for the 
finalization process. These numbers should be much lower, usually less than 1%.

      That the incremental GCs take 7ms? (I'm reading Andreas' comments on
      http://wiki.squeak.org/squeak/4210 again.)


7ms for an incremental GC is also a bit high, it should be around 1-2ms.


Careful.  Absolute GC times (as any run-time) depend on the machine.  On a slow 
machine GC runs slow...

Of course, but the incremental GC is intended to be fast enough tobe unnoticable even during audio playback, so it shouldn't take more than1-2ms on a slower system.



Levente

 



      Levente


            frank

                  Levente



                        frank

                        On 14 September 2011 20:26, Frank Shearar 
<frank.shea...@gmail.com> wrote:

                              On 3 September 2011 19:35, Nicolas Cellier
                              <nicolas.cellier.aka.n...@gmail.com> wrote:

                                    2011/9/3 Frank Shearar 
<frank.shea...@gmail.com>:

                                          On 3 September 2011 18:50, Lukas Renggli 
<reng...@gmail.com> wrote:

                                                I think it is a good idea to 
have the number parser
                                                separate, after
                                                all it might also make sense to 
use it separately.

                                                It seems that the new Smalltalk 
grammar is significantly
                                                slower. The
                                                benchmark PPSmalltalkClassesTests 
class>>#benchmark: that
                                                uses the
                                                source code of the collection 
hierarchy and does not
                                                especially target
                                                number literals runs 30% slower.

                                                Also I see that "Number readFrom: 
..." is still used
                                                within the
                                                grammar. This seems to be a bit 
strange, no?



                                          Yes: it's a double-parse, which is a 
bit lame. First, we parse the
                                          literal with PPSmalltalkNumberParser, 
which ensures that the thing
                                          given to Number class >> #readFrom: 
is a well-formed token (so, in
                                          particular, Squeak's Number doesn't 
get to see anything other than a
                                          well-formed token).

                                          It sounds like you're happy with the 
basic concept, so maybe I should
                                          remove the Number class >> #readFrom: 
stuff, see if I can't remove the
                                          performance issues, and resubmit the 
patch.

                                          frank


                                    Yes, a NumberParser is essentially parsing, 
and this duplication sounds
                                    useless.
                                    The main feature of interest in 
NumberParser that I consider a
                                    requirement and should find its equivalence 
in a PetitNumberParser is:
                                    - round a decimal representation to nearest 
Float
                                    It's simple, just convert a Fraction 
asFloat in a single final step to
                                    avoid cumulating round off errors - see
                                    #makeFloatFromMantissa:exponent:base:

                                    The second feature of interest in 
NumberParser is the ability to
                                    parser LargeInteger efficiently by avoiding 
(10 * largeValue +
                                    digitValue) loops, and replacing them with 
a log(n) cost.
                                    This would be a simple thing to implement 
in a functional language.



                              Hopefully this won't offend your sensibilities 
too much :). It does,
                              in fact, use 10* loops - I wrote an experimental 
"front half * rear
                              half" recursion, which was slower in my 
benchmarks.

                              This version has the grammar and parser doing no 
string->number
                              conversion at all. PPSmalltalkNumberMaker 
supplies a number of utility
                              methods designed to stop one from making 
malformed numbers. It also
                              supplies a builder interface that the parser uses 
to construct
                              numbers.

                              frank

                                    Nicolas

                                                Lukas


                                                On 3 September 2011 17:18, 
Frank Shearar
                                                <frank.shea...@gmail.com>
                                                wrote:

                                                      On 3 September 2011 
15:56, Lukas Renggli
                                                      <reng...@gmail.com> wrote:

                                                            On 3 September 2011 
16:51, Frank
                                                            Shearar 
<frank.shea...@gmail.com>
                                                            wrote:

                                                                  Hi Lukas,

                                                                  I haven't :) 
mainly
                                                                  because I'm 
unsure
                                                                  where to put 
it - is
                                                                  there
                                                                  perhaps a PP 
Inbox, or
                                                                  shall I just 
post the
                                                                  merged 
version, or
                                                                  what's
                                                                  your 
preference? (How
                                                                  about an mcd 
between
                                                                  my merge and 
PP's
                                                                  head?)



                                                            Just put the .mcz 
at some public
                                                            URL (dropbox, 
squeak source, ...)
                                                            or
                                                            attach it to a mail.



                                                      Ah, great - here it is. 
You'll see I've
                                                      written the grammar as a
                                                      separate class. That was 
really more to make
                                                      what I'd done more
                                                      obvious and to minimise 
the change to
                                                      PPSmalltalkGrammar, but 
perhaps
                                                      it's not a bad idea 
anyway: it's easy to see
                                                      the number literal
                                                      subgrammar.

                                                      frank

                                                            Lukas

                                                            --
                                                            Lukas Renggli
                                                            www.lukas-renggli.ch






                                                --
                                                Lukas Renggli
                                                www.lukas-renggli.ch














--
best,Eliot

Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Reply via email to