[GitHub] spark pull request #16033: SPARK-18607 get a result on a percent of the task...

srowen Mon, 28 Nov 2016 04:15:08 -0800

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16033#discussion_r89769503
  
    --- Diff: 
core/src/main/scala/org/apache/spark/partial/ApproximateActionListener.scala ---
    @@ -34,11 +34,13 @@ private[spark] class ApproximateActionListener[T, U, R](
         rdd: RDD[T],
         func: (TaskContext, Iterator[T]) => U,
         evaluator: ApproximateEvaluator[U, R],
    -    timeout: Long)
    +    timeout: Long = 1000*60*60*24*30*12,
    --- End diff --
    
    You won't be able to do this because it changes the binary API. This is 
also mixing up two different semantics: wait for an amount of time, or wait for 
an amount of completion. I get the use case for both, but if this is intended 
to deal with skew, I think it's probably not the right solution in general. You 
need to deal with the skew more directly. It's not clear that ignoring 1 
partition is the right thing to do, especially when it contains a lot of the 
data.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16033: SPARK-18607 get a result on a percent of the task...

Reply via email to