It doesn't guarantee the exact sample size. If you fix the random
seed, it would return the same result every time. -Xiangrui
On Wed, May 21, 2014 at 2:05 PM, glxc r.ryan.mcc...@gmail.com wrote:
I have a graph and am trying to take a random sample of vertices without
replacement, using the RDD.sample() method
verts are the vertices in the graph
val verts = graph.vertices
and executing this multiple times in a row
verts.sample(false, 1.toDouble/v1.count.toDouble,
System.currentTimeMillis).count
yields different results roughly each time (albeit +/- a small % of the
target)
why does this happen? Looked at PartionwiseSampledRDD but can't figure it
out
Also, is there another method/technique to yield the same result each time?
My understanding is that grabbing random indices may not be the best use of
the RDD model
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-RDD-Sample-size-tp6197.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.