[ https://issues.apache.org/jira/browse/SPARK-25976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680938#comment-16680938 ]
Yuval Yaari commented on SPARK-25976: ------------------------------------- correct, however in scala there is not much performance penalty on asking isEmpty. i suggest: {code:java} > sc.emptyRDD[Double]().reduce(_ + _) java.lang.UnsupportedOperationException("empty collection") > sc.emptyRDD[Double]().reduce(_ + _, () => Double.NaN) Double.NaN {code} > Allow rdd.reduce on empty rdd by returning an Option[T] > ------------------------------------------------------- > > Key: SPARK-25976 > URL: https://issues.apache.org/jira/browse/SPARK-25976 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.3.2 > Reporter: Yuval Yaari > Priority: Minor > > it is sometimes useful to let the user decide what value to return when > reducing on an empty rdd. > currently, if there is no data to reduce an UnsupportedOperationException is > thrown. > although user can catch that exception, it seems like a "shaky" solution as > UnsupportedOperationException might be thrown from a different location. > Instead, we can overload the reduce method by adding add a new method: > reduce(f: (T, T) => T, defaultIfEmpty: () => T): T > the reduce API will not be effected as it will simply call the second reduce > method throwing an UnsupportedException as the default value > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org