Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16033#discussion_r89769503 --- Diff: core/src/main/scala/org/apache/spark/partial/ApproximateActionListener.scala --- @@ -34,11 +34,13 @@ private[spark] class ApproximateActionListener[T, U, R]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, evaluator: ApproximateEvaluator[U, R], - timeout: Long) + timeout: Long = 1000*60*60*24*30*12, --- End diff -- You won't be able to do this because it changes the binary API. This is also mixing up two different semantics: wait for an amount of time, or wait for an amount of completion. I get the use case for both, but if this is intended to deal with skew, I think it's probably not the right solution in general. You need to deal with the skew more directly. It's not clear that ignoring 1 partition is the right thing to do, especially when it contains a lot of the data.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org