On Thursday, 11 July 2013 at 10:25:40 UTC, Joseph Rushton
Wakeling wrote:
On 07/11/2013 02:24 AM, Joseph Rushton Wakeling wrote:
I know, and it's coming. :-) The main memory-related issues
will probably not
show up in a situation like this where all we're doing is
storing the graph
data, but in the case where algorithms are being performed on
the data.
For comparison, I performed the same tests but with a 10_000
node graph. Here
we see similar memory use, but igraph outperforms dgraph by a
factor of nearly
10 even with the insert of nodes one at a time.
Profiling shows that the time difference is accounted for by
the sorting
algorithm used in the indexEdges() method:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
93.12 110.01 110.01 20000 0.01 0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ1!
1__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
3.88 114.59 4.58 20000 0.00 0.00
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ1!
1__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
1.81 116.73 2.14 1124043790 0.00 0.00
_D3std5range14__T3ZipTAmTAmZ3Zip7opIndexMFNaNbNfmZS3std8typecons14__T5TupleTmTmZ5Tuple
0.59 117.42 0.70 203164131 0.00 0.00
_D3std9algorithm43__T6swapAtTS3std5range14__T3ZipTAmTAmZ3ZipZ6swapAtFNaNbNfS3std5range14__T3ZipTAmTAmZ3ZipmmZv
0.42 117.92 0.50 20000 0.00 0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv
By default, schwartzSort uses SwapStrategy.unstable, which
means quicksort is
used as the sorting mechanism. If we replace this with
SwapStrategy.stable or
SwapStrategy.semistable, TimSort is used instead, and this
dramatically cuts the
running time -- from almost 2 minutes to under 3 seconds
(compared to 13 seconds
for igraph with one-by-one edge addition), and under 2 if ldmd2
is used for
compiling.
This comes at a cost of increasing memory usage from about 3.7
MB (almost
identical to igraph) to 5.6.
Probably the best way to secure optimal performance without the
memory hit is to
use an insertion sort (or maybe a smoothsort?). I guess that
Timsort would be
best to use in the case of adding multiple edges in one go,
unless no edges at
all have been added before, in which case quicksort would
probably be optimal;
though quicksort would probably remain best if memory
management is the priority.
So, the new D code is still competitive with igraph, but needs
some smoothing
around the edges (quite literally!:-).
This is very promising. Great work!