[ https://issues.apache.org/jira/browse/GIRAPH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550370#comment-13550370 ]
Alessandro Presta commented on GIRAPH-477: ------------------------------------------ Thanks for the review Eli. The main problem here is not that it's more data to store, but that (at least at our scale) that loop is a real bottleneck. We also need to take another look at our split reservation strategy, because I suspect we could be a lot smarter in terms of not having all workers (and threads) fetching the same data from ZooKeeper over and over. We could even consider having the master take care of this instead of ZK. But at least making this code path optional fixes the issue. > Fetching locality info in InputSplitPathOrganizer causes jobs to hang > --------------------------------------------------------------------- > > Key: GIRAPH-477 > URL: https://issues.apache.org/jira/browse/GIRAPH-477 > Project: Giraph > Issue Type: Bug > Reporter: Alessandro Presta > Assignee: Alessandro Presta > Attachments: GIRAPH-477.patch > > > In the presence of many input splits (>6000 in our case) and input split > threads (3000), the loop that fetches locality info for all splits from > ZooKeeper becomes a bottleneck. A few workers aren't able to even iterate > once over the list, run into increased GC pauses, and eventually time out. > Furthermore, depending on the cluster configuration, it's not always > possible/useful to exploit locality. > We should add a flag so that the feature can be optionally disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira