[jira] [Resolved] (GIRAPH-308) Giraph consistently creates 10% more InputSplits than one would expect

Dionysios Logothetis (Jira) Tue, 12 May 2020 11:28:19 -0700


     [ 
https://issues.apache.org/jira/browse/GIRAPH-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dionysios Logothetis resolved GIRAPH-308.
-----------------------------------------
    Resolution: Abandoned

> Giraph consistently creates 10% more InputSplits than one would expect
> ----------------------------------------------------------------------
>
>                 Key: GIRAPH-308
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-308
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 1.0.0
>            Reporter: Eli Reisman
>            Priority: Minor
>              Labels: documentation
>             Fix For: 1.0.0
>
>
> As I have been doing a lot of instrumented runs for scale out, and to test 
> 246 and 301 (among other patches) I have seen the the calculation:
> (# of MB in input files) / (giraph.splitmb setting) == # of InputSplits to 
> expect
> is not arriving at the number of splits one would expect. I would think there 
> would be an extra now and then to round off fractional amounts in a 
> calculation such as the one stated above, but I'm consistently seeing more 
> than that, roughly 10% more than one would expect and this is consistent over 
> runs with many different size data loads. 
> If there is some simple explanation, perhaps I'll find it in the code but 
> either way I wanted to post a JIRA because this is somewhat counterintuitive 
> and suggests we should alter the behavior of giraph.splitmb to ensure users 
> get what they expect in terms of input splits. In memory scarcity use cases, 
> I am finding that if a given worker reads just one split too many on a given 
> data load, it will overload and fail. Knowing how many workers to allocate 
> for a given data load with some precision has been the key to scale out under 
> scarce resources here. Seeing these numbers now as I test 301 (which is meant 
> to help ensure the split-reading load is spread out evenly among workers) I 
> see this has fooled me at times in the past when setting -w and 
> -Dgiraph.splitmb options carefully.
> At the very least, it would be nice to hear from someone that knows whats 
> going on here what the deal is so there is a definitive posting on this 
> matter that folks can refer to for information in the future when exploring a 
> use case like mine. Many users here will be in the same boat as me, of course 
> :)
> Thanks in advance.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (GIRAPH-308) Giraph consistently creates 10% more InputSplits than one would expect

Reply via email to