peay created SPARK-21550: ---------------------------- Summary: approxQuantiles throws "next on empty iterator" on empty data Key: SPARK-21550 URL: https://issues.apache.org/jira/browse/SPARK-21550 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: peay
The documentation says: {code} null and NaN values will be removed from the numerical column before calculation. If the dataframe is empty or the column only contains null or NaN, an empty array is returned. {code} However, this small pyspark example {code} sql_context.range(10).filter(col("id") == 42).approxQuantile("id", [0.99], 0.001) {code} throws {code} Py4JJavaError: An error occurred while calling o3493.approxQuantile. : java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63) at scala.collection.IterableLike$class.head(IterableLike.scala:107) at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186) at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126) at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186) at scala.collection.TraversableLike$class.last(TraversableLike.scala:431) at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:186) at scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132) at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:186) at org.apache.spark.sql.catalyst.util.QuantileSummaries.query(QuantileSummaries.scala:207) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply$mcDD$sp(StatFunctions.scala:92) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org