Re: [Gnucap-devel] floating point optimization

al davis Fri, 08 Dec 2006 00:14:15 -0800

On Thursday 07 December 2006 04:44, al davis wrote:
> Two circuit files, one with 147000 nodes, other with 590000
> nodes.  The larger circuit swapped unaccepably on the small
> machine so I tested only the smaller circuit there.  These
> were used to compare speed.
>
>             AMD, large,  AMD small, intel-small
> std                39 sec    9.5 sec       11.2 sec
> -ffast-math:       39 sec     9.5 sec       11.2 sec
> -ffloat-store:   50 sec    12 sec       13 sec
>
> The "small" circuit takes 30 minutes to run on ng-spice, on
> the AMD, with equivalent results.  Note that the time is 9.5
> SECONDS on gnucap, 30 MINUTES in ng-spice.  The algorithms
> are different.


Adding more data ...

The "large" circuit takes almost 8 hours on ng-spice.

This is very significant ..

The "large" circuit is 4 times as big as the "small" circuit, 
repeating the "small" circuit 4 times.

With gnucap, it takes 4 times as long to run, or "linear time".  
Run time is linearly proportional to the size of the circuit.  
A simple loop runs in linear time.

With ng-spice (and spice 3f5 from which it was derived) it takes 
about 16 times as long to run.  16 is 4 squared, 
indicating "quadratic time".  Run time is proportional to the 
square of the circuit size.  A classic quadratic time code 
block would be a pair of nested loops.

I think the bottleneck in Spice is matrix ordering.  I think it 
uses the Markowitz algorithm, which is typically quadratic 
time.  It takes a long time to set up, then runs reasonably 
fast after it is set up.  For small circuits, ordering time is 
insignificant.  It is masked my matrix solution time for linear 
circuits, or model evaluation time for nonlinear circuits.  A 
quadratic algorithm will eventually dominate if the size gets 
big enough.

Gnucap uses a modified block-depth-first-search ordering.  It 
only orders within blocks, taming the quadratic effect.  So, it 
turns out at worst to be the sum of (size_of_block)^2.  The 
biggest block in this test is 37 nodes.  When a subcircuit is 
duplicated, it is only ordered once.  This test circuit is all 
repetition, so the size the ordering algorithm sees is 37 
nodes, once.  That's even better than linear.

On the other hand ...  Gnucap takes about twice as much memory 
as ng-spice.  The matrix is stored twice, components carry 
extra data to support mixed-mode, nodes carry extra data.  
There is really more space overhead than 2x, but it benefits 
some from sharing in a hierarchy.  Taming this memory usage is 
high on the to-do list.  I want to be able to say that it will 
handle a million nodes.  It is not there yet, unless you have 
about 4 gig of real memory.

Classic LU decomposition for a dense matrix takes cubic time.    
Sparse matrix techniques improve this, based on the assumption 
that as the matrix grows the number of connections remains 
roughly constant.  If this is so, you can typically get linear 
time.  Expansion by minors takes factorial time, which explains 
why nobody competent does it that way.


_______________________________________________
Gnucap-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnucap-devel

Re: [Gnucap-devel] floating point optimization

Reply via email to