On Thursday, 11 July 2013 at 10:25:40 UTC, Joseph Rushton Wakeling wrote:
On 07/11/2013 02:24 AM, Joseph Rushton Wakeling wrote:
I know, and it's coming. :-) The main memory-related issues will probably not show up in a situation like this where all we're doing is storing the graph data, but in the case where algorithms are being performed on the data.

For comparison, I performed the same tests but with a 10_000 node graph. Here we see similar memory use, but igraph outperforms dgraph by a factor of nearly
10 even with the insert of nodes one at a time.

Profiling shows that the time difference is accounted for by the sorting
algorithm used in the indexEdges() method:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 93.12    110.01   110.01    20000     0.01     0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ1!
1__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
  3.88    114.59     4.58    20000     0.00     0.00
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ1!
1__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
  1.81    116.73     2.14 1124043790     0.00     0.00
_D3std5range14__T3ZipTAmTAmZ3Zip7opIndexMFNaNbNfmZS3std8typecons14__T5TupleTmTmZ5Tuple
  0.59    117.42     0.70 203164131     0.00     0.00
_D3std9algorithm43__T6swapAtTS3std5range14__T3ZipTAmTAmZ3ZipZ6swapAtFNaNbNfS3std5range14__T3ZipTAmTAmZ3ZipmmZv
  0.42    117.92     0.50    20000     0.00     0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv

By default, schwartzSort uses SwapStrategy.unstable, which means quicksort is used as the sorting mechanism. If we replace this with SwapStrategy.stable or SwapStrategy.semistable, TimSort is used instead, and this dramatically cuts the running time -- from almost 2 minutes to under 3 seconds (compared to 13 seconds for igraph with one-by-one edge addition), and under 2 if ldmd2 is used for
compiling.

This comes at a cost of increasing memory usage from about 3.7 MB (almost
identical to igraph) to 5.6.

Probably the best way to secure optimal performance without the memory hit is to use an insertion sort (or maybe a smoothsort?). I guess that Timsort would be best to use in the case of adding multiple edges in one go, unless no edges at all have been added before, in which case quicksort would probably be optimal; though quicksort would probably remain best if memory management is the priority.

So, the new D code is still competitive with igraph, but needs some smoothing
around the edges (quite literally!:-).

This is very promising. Great work!

Reply via email to