Jianlong Zhong created GIRAPH-1161:
--------------------------------------

             Summary: implement random sampling for input splits
                 Key: GIRAPH-1161
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1161
             Project: Giraph
          Issue Type: Improvement
            Reporter: Jianlong Zhong
            Priority: Minor


Currently if we are reading vertex/edge data from multiple tables, and we only 
want to read a fraction of data (with giraph.inputSplitSamplePercent conf 
option), we'll always get the first inputSplitSamplePercent of the input slits. 
We should instead use a random sample of input splits so testing on sample of 
data would look closer to actual full data run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to