On Wed, 25 Apr 2012, Frank Shearar wrote:

On 20 April 2012 15:15, Frank Shearar <frank.shea...@gmail.com> wrote:
On 20 April 2012 03:51, Levente Uzonyi <le...@elte.hu> wrote:
On Thu, 19 Apr 2012, Frank Shearar wrote:

I found a serious bug in parsing numbers with negative exponents. It
was completey broken, in fact, parsing 1e-1 as 10, not 1 / 10. Anyway.
This version fixes that, and adds a bunch of tests demonstrating that
number parsing will return rationals if it can.

It's significantly slower than Squeak's SqNumberParser:

Time millisecondsToRun: [100000 timesRepeat: [SqNumberParser parse:
'1234567890']] => 466

Time millisecondsToRun: [100000 timesRepeat: [PPSmalltalkNumberParser
parse: '1234567890']] => 32082

I've attached a MessageTally spying on the latter: I've not much skill
in reading these, but nothing leaps out at me as being obviously
awful.


Didn't check the code, just the tally, and I think that
PPSmalltalkNumberParser(PPSmalltalkNumberGrammar)>>digitsBase: is begging
for optimization. It's probably also the cause of the high amount of garbage
which causes significant amount of time spent with garbage collection.
It's also interesting is that the finalization process does so much work,
there may be something wrong with your image.

Thanks for taking a look, Levente.

I'd expect digitsBase: to dominate the running costs, given that we're
parsing numbers.

I do make a large number of throwaway "immutable" values with a
Builder-like pattern... in PPSmalltalkNumberParser >>
#makeNumberFrom:base:. That, I would imagine, could explain the
garbage?

If I may, what do you look for when reading the MessageTally? How do
you tell, for instance, that there's excessive garbage production?
That the incremental GCs take 7ms? (I'm reading Andreas' comments on
http://wiki.squeak.org/squeak/4210 again.)

Levente, you're quite right: #digitsBase: has now been optimised even
more, reducing the time taken to run my benchmark

MessageTally spyOn: [Time millisecondsToRun: [100000 timesRepeat:
[PPSmalltalkNumberParser parse: '1234567890']]]

from ~32 seconds to ~16 seconds. (Memoising was the answer:
#digitsBase: is effectively a higher-order production and, like OMeta,
PPCompositeParser doesn't memoise those. A simple class var dictionary
solves that problem.

Creating a new parser (PPSmalltalkNumberParser) takes quite a while, if you extract the parser creation, then you'll get closer to the performance limit of PetitParser:

| p |
p := PPSmalltalkNumberParser new.
[ Time millisecondsToRun: [100000 timesRepeat: [p parse: '1234567890']] ] timeProfile. "==> 2594"

There are still places where you create parsers on the fly, like PPSmalltalkNumberGrammar #>> number. These should be avoided.


Levente



frank

frank

Levente



frank

On 14 September 2011 20:26, Frank Shearar <frank.shea...@gmail.com> wrote:

On 3 September 2011 19:35, Nicolas Cellier
<nicolas.cellier.aka.n...@gmail.com> wrote:

2011/9/3 Frank Shearar <frank.shea...@gmail.com>:

On 3 September 2011 18:50, Lukas Renggli <reng...@gmail.com> wrote:

I think it is a good idea to have the number parser separate, after
all it might also make sense to use it separately.

It seems that the new Smalltalk grammar is significantly slower. The
benchmark PPSmalltalkClassesTests class>>#benchmark: that uses the
source code of the collection hierarchy and does not especially target
number literals runs 30% slower.

Also I see that "Number readFrom: ..." is still used within the
grammar. This seems to be a bit strange, no?


Yes: it's a double-parse, which is a bit lame. First, we parse the
literal with PPSmalltalkNumberParser, which ensures that the thing
given to Number class >> #readFrom: is a well-formed token (so, in
particular, Squeak's Number doesn't get to see anything other than a
well-formed token).

It sounds like you're happy with the basic concept, so maybe I should
remove the Number class >> #readFrom: stuff, see if I can't remove the
performance issues, and resubmit the patch.

frank


Yes, a NumberParser is essentially parsing, and this duplication sounds
useless.
The main feature of interest in NumberParser that I consider a
requirement and should find its equivalence in a PetitNumberParser is:
- round a decimal representation to nearest Float
It's simple, just convert a Fraction asFloat in a single final step to
avoid cumulating round off errors - see
#makeFloatFromMantissa:exponent:base:

The second feature of interest in NumberParser is the ability to
parser LargeInteger efficiently by avoiding (10 * largeValue +
digitValue) loops, and replacing them with a log(n) cost.
This would be a simple thing to implement in a functional language.


Hopefully this won't offend your sensibilities too much :). It does,
in fact, use 10* loops - I wrote an experimental "front half * rear
half" recursion, which was slower in my benchmarks.

This version has the grammar and parser doing no string->number
conversion at all. PPSmalltalkNumberMaker supplies a number of utility
methods designed to stop one from making malformed numbers. It also
supplies a builder interface that the parser uses to construct
numbers.

frank

Nicolas

Lukas


On 3 September 2011 17:18, Frank Shearar <frank.shea...@gmail.com>
wrote:

On 3 September 2011 15:56, Lukas Renggli <reng...@gmail.com> wrote:

On 3 September 2011 16:51, Frank Shearar <frank.shea...@gmail.com>
wrote:

Hi Lukas,

I haven't :) mainly because I'm unsure where to put it - is there
perhaps a PP Inbox, or shall I just post the merged version, or
what's
your preference? (How about an mcd between my merge and PP's head?)


Just put the .mcz at some public URL (dropbox, squeak source, ...)
or
attach it to a mail.


Ah, great - here it is. You'll see I've written the grammar as a
separate class. That was really more to make what I'd done more
obvious and to minimise the change to PPSmalltalkGrammar, but perhaps
it's not a bad idea anyway: it's easy to see the number literal
subgrammar.

frank

Lukas

--
Lukas Renggli
www.lukas-renggli.ch






--
Lukas Renggli
www.lukas-renggli.ch










Reply via email to