[ 
https://issues.apache.org/jira/browse/GIRAPH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538266#comment-13538266
 ] 

Alessandro Presta commented on GIRAPH-436:
------------------------------------------

There is no such counter at partition level, you would have to add it and keep 
it updated in GraphMapper.
I have a doubt, though: with partitioning being random, isn't it likely that 
active vertices will be evenly distributed across partitions?
If that's the case, even in SSSP almost all partitions will be active until 
there are less active vertices than partitions. It would be useful to see how 
active vertices change across supersteps in graphs with different degree 
distributions.
                
> Improvement of partition selection for out-of-core graph
> --------------------------------------------------------
>
>                 Key: GIRAPH-436
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-436
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.2.0
>            Reporter: Claudio Martella
>            Priority: Minor
>
> Currently, the OOC graph component needs a parameter K that describes how 
> many partitions should be kept in memory out of N (the number of partitions 
> assigned to that worker). Hence N - K partitions are constantly scanned and 
> from/to disk.
> Now, as discussed on GIRAPH-249, for algorithms were the all the vertices are 
> NOT always active, e.g. SSSP, this can have an impact on performance, as we 
> might get to (a) scan a partition on disk with very few active vertices (b) 
> scan a partition on disk with NO active vertices (c) scan a partition on disk 
> with all active vertices while we have some partitions with NO active 
> vertices in memory. The impact of OOC graph on other algorithms, e.g. PR-like 
> algorithms, is less important, as the cost of hitting the disk is completely 
> amortized (each IO byte is actually computed as all the vertices are active) 
> and efficient (scan-based).
> For the affected class of algorithms, two contributions are possible:
> 1) add a header to the partition file with the stats (a boolean hasActive 
> field) about active vertices in the partition. If that partition hasn't 
> received messages AND all the vertices are inactive, it can be skipped 
> completely. Solves for problem (a). Cost: one random IO to the header at the 
> end of each OOC partition computation.
> 2) greedy heuristic that tries to keep in memory only K partitions, when 
> possible, that have hasActive set to true. Once an inactive partition is 
> spilled to disk, and it stays to disk, it won't be scanned/read back to 
> memory unless necessary (works with contribution (1)). This accounts for 
> problem (c).
> As far as (b) is concerned, there's not much one can do without breaking 
> linear scanning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to