GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21980
[SPARK-25010][SQL] Rand/Randn should produce different values for each execution in streaming query ## What changes were proposed in this pull request? Like Uuid in SPARK-24896, Rand and Randn expressions now produce the same results for each execution in streaming query. It doesn't make too much sense for streaming queries. We should make them produce different results as Uuid. In this change, similar to Uuid, we assign new random seeds to Rand/Randn when returning optimized plan from `IncrementalExecution`. Note: Different to Uuid, Rand/Randn can be created with initial seed. Because we replace this initial seed at `IncrementalExecution`, it doesn't use the initial seed anymore. For now it seems to me not a big issue for streaming query. But need to confirm with others. cc @zsxwing @cloud-fan ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-25010 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21980.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21980 ---- commit 1e0370ec1c5f3920a3ba59abb46446e255ecb55b Author: Liang-Chi Hsieh <viirya@...> Date: 2018-08-02T23:35:10Z Rand/Randn should produce different values for each execution in streaming query. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org