[
https://issues.apache.org/jira/browse/GIRAPH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dionysios Logothetis resolved GIRAPH-436.
-----------------------------------------
Resolution: Abandoned
> Improvement of partition selection for out-of-core graph
> --------------------------------------------------------
>
> Key: GIRAPH-436
> URL: https://issues.apache.org/jira/browse/GIRAPH-436
> Project: Giraph
> Issue Type: Improvement
> Components: graph
> Affects Versions: 1.0.0
> Reporter: Claudio Martella
> Priority: Minor
>
> Currently, the OOC graph component needs a parameter K that describes how
> many partitions should be kept in memory out of N (the number of partitions
> assigned to that worker). Hence N - K partitions are constantly scanned and
> from/to disk.
> Now, as discussed on GIRAPH-249, for algorithms were the all the vertices are
> NOT always active, e.g. SSSP, this can have an impact on performance, as we
> might get to (a) scan a partition on disk with very few active vertices (b)
> scan a partition on disk with NO active vertices (c) scan a partition on disk
> with all active vertices while we have some partitions with NO active
> vertices in memory. The impact of OOC graph on other algorithms, e.g. PR-like
> algorithms, is less important, as the cost of hitting the disk is completely
> amortized (each IO byte is actually computed as all the vertices are active)
> and efficient (scan-based).
> For the affected class of algorithms, two contributions are possible:
> 1) add a header to the partition file with the stats (a boolean hasActive
> field) about active vertices in the partition. If that partition hasn't
> received messages AND all the vertices are inactive, it can be skipped
> completely. Solves for problem (a). Cost: one random IO to the header at the
> end of each OOC partition computation.
> 2) greedy heuristic that tries to keep in memory only K partitions, when
> possible, that have hasActive set to true. Once an inactive partition is
> spilled to disk, and it stays to disk, it won't be scanned/read back to
> memory unless necessary (works with contribution (1)). This accounts for
> problem (c).
> As far as (b) is concerned, there's not much one can do without breaking
> linear scanning.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)