peay created SPARK-21550:
----------------------------

             Summary: approxQuantiles throws "next on empty iterator" on empty 
data
                 Key: SPARK-21550
                 URL: https://issues.apache.org/jira/browse/SPARK-21550
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: peay


The documentation says:
{code}
null and NaN values will be removed from the numerical column before 
calculation. If
the dataframe is empty or the column only contains null or NaN, an empty array 
is returned.
{code}

However, this small pyspark example
{code}
sql_context.range(10).filter(col("id") == 42).approxQuantile("id", [0.99], 
0.001)
{code}

throws

{code}
Py4JJavaError: An error occurred while calling o3493.approxQuantile.
: java.util.NoSuchElementException: next on empty iterator
        at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
        at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
        at 
scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
        at scala.collection.IterableLike$class.head(IterableLike.scala:107)
        at 
scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186)
        at 
scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
        at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
        at 
scala.collection.TraversableLike$class.last(TraversableLike.scala:431)
        at 
scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:186)
        at 
scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132)
        at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:186)
        at 
org.apache.spark.sql.catalyst.util.QuantileSummaries.query(QuantileSummaries.scala:207)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply$mcDD$sp(StatFunctions.scala:92)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to