Jianlong Zhong created GIRAPH-1161:
--------------------------------------
Summary: implement random sampling for input splits
Key: GIRAPH-1161
URL: https://issues.apache.org/jira/browse/GIRAPH-1161
Project: Giraph
Issue Type: Improvement
Reporter: Jianlong Zhong
Priority: Minor
Currently if we are reading vertex/edge data from multiple tables, and we only
want to read a fraction of data (with giraph.inputSplitSamplePercent conf
option), we'll always get the first inputSplitSamplePercent of the input slits.
We should instead use a random sample of input splits so testing on sample of
data would look closer to actual full data run.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)