Re: [Bug-apl] Fun With Benchmarks

fred Sun, 23 Aug 2015 06:44:23 -0700

Elias

Very interesting. Thanks for your data gathering!


It strikes me that memoizing to_value may be useful. On second thought,
it wouldn't be that useful. But, to_value should be optimized. I'll
have a look (when I get cycles).

The garbage collection destructor is a bit disturbing. Mark&sweep
"spanked" this -- because the time was proportional to live storage,
and not to garbage (the destructor is called on production of garbage).

FredW

On Sun, 2015-08-23 at 19:14 +0800, Elias Mårtenson wrote:
> Another thing of note: The destructor Value_P::~Value_P() was called
> 395899364 times, and used 16.20% time.
> 
> On 23 August 2015 at 19:04, Elias Mårtenson <[email protected]>
> wrote:
> 
> > Well, I've run the test and I have some results. They were somewhat
> > unexpected as the time spent in Value::clone() is much less than it
> > was
> > for other tests. That said, the clone issue is mostly visible when
> > manipulating very large arrays, which this test case do not. In
> > this case,
> > 9.26% of the time was spent in Value::clone() and its descendants.
> > 
> > The main consumer of CPU time in this test case is the reduction on
> > + in
> > the following command:
> > 
> >     Z←((¯1+⍴Z)⌊+/∧\'0'=Z)↓Z←D[⌽Z]
> > 
> > The +/ reduction uses 70% of the CPU time. This includes 28%
> > performing
> > the addition operation (Bif_F12_PLUS::eval_AB()). Another huge
> > contributor was the call to Cell::to_value() which contributed
> > 29.21%.
> > Note that the 28% time spend in the addition and the almost 30% in
> > to_value() are separate.
> > 
> > In other words, the addition and the value conversion consumes 60%
> > of the
> > total time, which is part of the reduction operation (70%).
> > 
> > Regards,
> > Elias
> > 
> > On 23 August 2015 at 05:21, fred <[email protected]> wrote:
> > 
> > > Mike Duvos
> > > 
> > > Thank you for the correction. I have timed your code:
> > > 
> > >       ⎕IO←0
> > > 
> > >       ∇TIME X;TS
> > > 
> > >       ∇Z←SHOW X;I
> > > 
> > >       ∇Z←X TIMES Y;D;I;C
> > > 
> > >       ∇Z←FACTORIAL N;I
> > > 
> > >       TIME 'SHOW FACTORIAL 300'
> > > 30605751221644063603537046129726862938858880417357
> > > 69994167767412594765331767168674655152914224775733
> > > 49939147888701726368864263907759003154226842927906
> > > 97455984122547693027195460400801221577625217685425
> > > 59653569035067887252643218962642993652045764488303
> > > 88909753943489625436053225980776521270822437639449
> > > 12012867867536830571229368194364995646049816645022
> > > 77165001851765464693401122260347297240663332585835
> > > 06870150169794168850353752137554910289126407157154
> > > 83028228493795263658014523523315693648223343679925
> > > 45940952768206080622328123873838808170496000000000
> > > 00000000000000000000000000000000000000000000000000
> > > 000000000000000
> > > 22.977 Seconds.
> > > 
> > > Now, the code under GNU APL runs in comparable time to the
> > > implementation in SNOBOL4, at least.
> > > 
> > > This is still not good. Maybe not horrible, though.
> > > 
> > > I rather suspect that the data copying that Elias Mårtenson
> > > alludes to
> > >  is dominant in execution. The SNOBOL4 code has (probably)
> > > considerably
> > > more interpretation overhead, and is forced to copy the numeric
> > > string
> > > on each modification (strings are immutable). It hashes each
> > > string
> > > into a global hash on each such modification. If the APL code is
> > > forced
> > > into the same contortions (essentially, copying each vector), it
> > > should
> > > perform at a similar speed. Given that it does execute at the
> > > "speed of
> > > SNOBOL", I suspect that is what is going on.
> > > 
> > > Eagerly awaiting Elias' results.
> > > 
> > > FredW
> > > 
> > > On Sat, 2015-08-22 at 12:20 -0400, fred wrote:
> > > > Ok, so infinite precision integer arithmetic takes over 50
> > > > seconds with
> > > 
> > > > GNU APL to compute 300!
> > > > 
> > > > Um... not good. Actually, this is horrific.
> > > > 
> > > > I will attempt to put this into perspective. I use the
> > > > interpretive
> > > > SNOBOL4 implementation from Griswold. This is code that
> > > > implements a
> > > > SNOBOL4 interpreter. The implementation is that the code
> > > > implementing
> > > > the interpreter (which was written in the 1960's) is macro
> > > > -expanded
> > > > into a C program, which is then compiled and run to actually
> > > > interpret
> > > > the SNOBOL4 language source.
> > > > 
> > > > Ok? This is the SLOWEST SNOBOL4 implementation that I know...
> > > > 
> > > > I used an infinite precision arithmetic package written 40
> > > > years ago
> > > > (specifically, for education -- not for performance). Now, one
> > > > of the
> > > > reasons this package is slow is that it REDEFINES '+', '-',
> > > > '*', '/'
> > > > operators AT RUN TIME... Not only are the data types dynamic,
> > > > the
> > > > actual functions are also dynamic, and have been redefined.
> > > > 
> > > > Now, if you are still with me -- an interpreter that is macro
> > > > expanded
> > > > to C running a run-time binding operator redefinition program
> > > > in a
> > > > language where strings are immutable, and must be completely
> > > > hashed/copied on each change... and has complete mark/sweep
> > > > garbage
> > > > collection -- again, implemented in the macro expanded
> > > > interpreter.
> > > > 
> > > > Let us look at the code:
> > > > 
> > > > -include 'INFINIP.INC'
> > > >     infinip_start() ;* redefine basic math functions to work on
> > > > strings
> > > >     x = '1'
> > > >     i = 1
> > > >     l = 300
> > > >     t = time()
> > > > top gt(i, l) :s(btm)
> > > >     x = x * i
> > > >     i = i + 1 :(top)
> > > > btm t = time() - t
> > > >     output = x
> > > >     output = t ' milliseconds'
> > > > end
> > > > 
> > > > Now, I had a problem running the Davos code (but I haven't
> > > > attempted
> > > > debugging - 52 seconds seemed extreme) -- but I assume 52
> > > > seconds is..
> > > > um.. normal. I will run this code on a 1.5Ghz Intel i5 (this is
> > > > my
> > > > Linux tablet, a three year old Acer Iconia tablet):
> > > > 
> > > > $ snobol4 -s ifact
> > > > 306057512216440636035370461297268629388588804173576999416776741
> > > > 25947653
> > > > 317671686746551529142247757334993914788870172636886426390775900
> > > > 31542268
> > > > 429279069745598412254769302719546040080122157762521768542559653
> > > > 56903506
> > > > 788725264321896264299365204576448830388909753943489625436053225
> > > > 98077652
> > > > 127082243763944912012867867536830571229368194364995646049816645
> > > > 02277165
> > > > 001851765464693401122260347297240663332585835068701501697941688
> > > > 50353752
> > > > 137554910289126407157154830282284937952636580145235233156936482
> > > > 23343679
> > > > 925459409527682060806223281238738388081704960000000000000000000
> > > > 00000000
> > > > 00000000000000000000000000000000000000000000000
> > > > 20315.096404 milliseconds
> > > > SNOBOL4 statistics summary-
> > > >           1.080 ms. Compilation time
> > > >       20315.416 ms. Execution time
> > > >        36453943 Statements executed, 19353232 failed
> > > >         1834289 Arithmetic operations performed
> > > >         7899675 Pattern matches performed
> > > >             338 Regenerations of dynamic storage
> > > >        1680.975 ms. Execution time in GC
> > > >               0 Reads performed
> > > >               2 Writes performed
> > > >         557.290 ns. Average per statement executed
> > > >        1794.398 Thousand statements per second
> > > > $
> > > > 
> > > > 20.3 seconds total. Since this was running with ONLY 8MB memory
> > > > (default), and EVERY string change needed a new copy, 338
> > > > garbage
> > > > collections where needed. That is mark&sweep. GC took 1.7
> > > > seconds (of
> > > > that 20.3 seconds total)
> > > > 
> > > > FredW
> > > > 
> > > 
> > > 
> >

Re: [Bug-apl] Fun With Benchmarks

Reply via email to