What's the bug? Each element is sampled with probability 0.5. I think the
expected size is 14 but not all samples would be that size.
On Mar 17, 2015 12:12 AM, "Marko Bonaci (JIRA)" <j...@apache.org> wrote:

> Marko Bonaci created SPARK-6370:
> -----------------------------------
>
>              Summary: RDD sampling with replacement intermittently yields
> incorrect number of samples
>                  Key: SPARK-6370
>                  URL: https://issues.apache.org/jira/browse/SPARK-6370
>              Project: Spark
>           Issue Type: Bug
>           Components: Spark Core
>     Affects Versions: 1.2.1, 1.3.0
>          Environment: Ubuntu 14.04 64-bit, spark-1.3.0-bin-hadoop2.4
>             Reporter: Marko Bonaci
>
>
> Here's the repl output:
>
> {{code:java}}
> scala> uniqueIds.collect
> res10: Array[String] = Array(4, 8, 21, 80, 20, 98, 42, 15, 48, 36, 90, 46,
> 55, 16, 31, 71, 9, 50, 28, 61, 68, 85, 12, 94, 38, 77, 2, 11, 10)
>
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[22] at
> sample at <console>:27
>
> scala> swr.count
> res17: Long = 16
>
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[23] at
> sample at <console>:27
>
> scala> swr.count
> res18: Long = 8
>
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[24] at
> sample at <console>:27
>
> scala> swr.count
> res19: Long = 18
>
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[25] at
> sample at <console>:27
>
> scala> swr.count
> res20: Long = 15
>
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[26] at
> sample at <console>:27
>
> scala> swr.count
> res21: Long = 11
>
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[27] at
> sample at <console>:27
>
> scala> swr.count
> res22: Long = 10
> {{code}}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
> For additional commands, e-mail: issues-h...@spark.apache.org
>
>

Reply via email to