[
https://issues.apache.org/jira/browse/GIRAPH-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dionysios Logothetis resolved GIRAPH-308.
-----------------------------------------
Resolution: Abandoned
> Giraph consistently creates 10% more InputSplits than one would expect
> ----------------------------------------------------------------------
>
> Key: GIRAPH-308
> URL: https://issues.apache.org/jira/browse/GIRAPH-308
> Project: Giraph
> Issue Type: Bug
> Components: graph
> Affects Versions: 1.0.0
> Reporter: Eli Reisman
> Priority: Minor
> Labels: documentation
> Fix For: 1.0.0
>
>
> As I have been doing a lot of instrumented runs for scale out, and to test
> 246 and 301 (among other patches) I have seen the the calculation:
> (# of MB in input files) / (giraph.splitmb setting) == # of InputSplits to
> expect
> is not arriving at the number of splits one would expect. I would think there
> would be an extra now and then to round off fractional amounts in a
> calculation such as the one stated above, but I'm consistently seeing more
> than that, roughly 10% more than one would expect and this is consistent over
> runs with many different size data loads.
> If there is some simple explanation, perhaps I'll find it in the code but
> either way I wanted to post a JIRA because this is somewhat counterintuitive
> and suggests we should alter the behavior of giraph.splitmb to ensure users
> get what they expect in terms of input splits. In memory scarcity use cases,
> I am finding that if a given worker reads just one split too many on a given
> data load, it will overload and fail. Knowing how many workers to allocate
> for a given data load with some precision has been the key to scale out under
> scarce resources here. Seeing these numbers now as I test 301 (which is meant
> to help ensure the split-reading load is spread out evenly among workers) I
> see this has fooled me at times in the past when setting -w and
> -Dgiraph.splitmb options carefully.
> At the very least, it would be nice to hear from someone that knows whats
> going on here what the deal is so there is a definitive posting on this
> matter that folks can refer to for information in the future when exploring a
> use case like mine. Many users here will be in the same boat as me, of course
> :)
> Thanks in advance.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)