Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Stéphane Ducasse Wed, 25 Apr 2012 10:28:12 -0700

frank 

did you publish in the petitparser repo?
because we should not lose that.


Stef

>>>> 
>>> Didn't check the code, just the tally, and I think that
>>> PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>digitsBase: is begging
>>> for optimization. It's probably also the cause of the high amount of garbage
>>> which causes significant amount of time spent with garbage collection.
>>> It's also interesting is that the finalization process does so much work,
>>> there may be something wrong with your image.
>> 
>> Thanks for taking a look, Levente.
>> 
>> I'd expect digitsBase: to dominate the running costs, given that we're
>> parsing numbers.
>> 
>> I do make a large number of throwaway "immutable" values with a
>> Builder-like pattern... in PPSmalltalkNumberParser >>
>> #makeNumberFrom:base:. That, I would imagine, could explain the
>> garbage?
>> 
>> If I may, what do you look for when reading the MessageTally? How do
>> you tell, for instance, that there's excessive garbage production?
>> That the incremental GCs take 7ms? (I'm reading Andreas' comments on
>> http://wiki.squeak.org/squeak/4210 again.)
> 
> Levente, you're quite right: #digitsBase: has now been optimised even
> more, reducing the time taken to run my benchmark
> 
> MessageTally spyOn: [Time millisecondsToRun: [100000 timesRepeat:
> [PPSmalltalkNumberParser parse: '1234567890']]]
> 
> from ~32 seconds to ~16 seconds. (Memoising was the answer:
> #digitsBase: is effectively a higher-order production and, like OMeta,
> PPCompositeParser doesn't memoise those. A simple class var dictionary
> solves that problem.
> 
> frank
> 
>> frank
>> 
>>> Levente
>>> 
>>> 
>>>> 
>>>> frank
>>>> 
>>>> On 14 September 2011 20:26, Frank Shearar <frank.shea...@gmail.com> wrote:
>>>>> 
>>>>> On 3 September 2011 19:35, Nicolas Cellier
>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote:
>>>>>> 
>>>>>> 2011/9/3 Frank Shearar <frank.shea...@gmail.com>:
>>>>>>> 
>>>>>>> On 3 September 2011 18:50, Lukas Renggli <reng...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> I think it is a good idea to have the number parser separate, after
>>>>>>>> all it might also make sense to use it separately.
>>>>>>>> 
>>>>>>>> It seems that the new Smalltalk grammar is significantly slower. The
>>>>>>>> benchmark PPSmalltalkClassesTests class>>#benchmark: that uses the
>>>>>>>> source code of the collection hierarchy and does not especially target
>>>>>>>> number literals runs 30% slower.
>>>>>>>> 
>>>>>>>> Also I see that "Number readFrom: ..." is still used within the
>>>>>>>> grammar. This seems to be a bit strange, no?
>>>>>>> 
>>>>>>> 
>>>>>>> Yes: it's a double-parse, which is a bit lame. First, we parse the
>>>>>>> literal with PPSmalltalkNumberParser, which ensures that the thing
>>>>>>> given to Number class >> #readFrom: is a well-formed token (so, in
>>>>>>> particular, Squeak's Number doesn't get to see anything other than a
>>>>>>> well-formed token).
>>>>>>> 
>>>>>>> It sounds like you're happy with the basic concept, so maybe I should
>>>>>>> remove the Number class >> #readFrom: stuff, see if I can't remove the
>>>>>>> performance issues, and resubmit the patch.
>>>>>>> 
>>>>>>> frank
>>>>>>> 
>>>>>> 
>>>>>> Yes, a NumberParser is essentially parsing, and this duplication sounds
>>>>>> useless.
>>>>>> The main feature of interest in NumberParser that I consider a
>>>>>> requirement and should find its equivalence in a PetitNumberParser is:
>>>>>> - round a decimal representation to nearest Float
>>>>>> It's simple, just convert a Fraction asFloat in a single final step to
>>>>>> avoid cumulating round off errors - see
>>>>>> #makeFloatFromMantissa:exponent:base:
>>>>>> 
>>>>>> The second feature of interest in NumberParser is the ability to
>>>>>> parser LargeInteger efficiently by avoiding (10 * largeValue +
>>>>>> digitValue) loops, and replacing them with a log(n) cost.
>>>>>> This would be a simple thing to implement in a functional language.
>>>>> 
>>>>> 
>>>>> Hopefully this won't offend your sensibilities too much :). It does,
>>>>> in fact, use 10* loops - I wrote an experimental "front half * rear
>>>>> half" recursion, which was slower in my benchmarks.
>>>>> 
>>>>> This version has the grammar and parser doing no string->number
>>>>> conversion at all. PPSmalltalkNumberMaker supplies a number of utility
>>>>> methods designed to stop one from making malformed numbers. It also
>>>>> supplies a builder interface that the parser uses to construct
>>>>> numbers.
>>>>> 
>>>>> frank
>>>>> 
>>>>>> Nicolas
>>>>>> 
>>>>>>>> Lukas
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 3 September 2011 17:18, Frank Shearar <frank.shea...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On 3 September 2011 15:56, Lukas Renggli <reng...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 3 September 2011 16:51, Frank Shearar <frank.shea...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Lukas,
>>>>>>>>>>> 
>>>>>>>>>>> I haven't :) mainly because I'm unsure where to put it - is there
>>>>>>>>>>> perhaps a PP Inbox, or shall I just post the merged version, or
>>>>>>>>>>> what's
>>>>>>>>>>> your preference? (How about an mcd between my merge and PP's head?)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Just put the .mcz at some public URL (dropbox, squeak source, ...)
>>>>>>>>>> or
>>>>>>>>>> attach it to a mail.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Ah, great - here it is. You'll see I've written the grammar as a
>>>>>>>>> separate class. That was really more to make what I'd done more
>>>>>>>>> obvious and to minimise the change to PPSmalltalkGrammar, but perhaps
>>>>>>>>> it's not a bad idea anyway: it's easy to see the number literal
>>>>>>>>> subgrammar.
>>>>>>>>> 
>>>>>>>>> frank
>>>>>>>>> 
>>>>>>>>>> Lukas
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Lukas Renggli
>>>>>>>>>> www.lukas-renggli.ch
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Lukas Renggli
>>>>>>>> www.lukas-renggli.ch
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
> <PetitSmalltalk-fbs.63.mcz>

Re: [Pharo-project] PetitSmalltalk: now with a self-contained grammar for number literals

Reply via email to