Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19325#discussion_r140852903
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
 ---
    @@ -51,10 +51,12 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], 
output: Seq[Attribute], chi
           outputIterator.map(new ArrowPayload(_)), context)
     
         // Verify that the output schema is correct
    -    val schemaOut = 
StructType.fromAttributes(output.drop(child.output.length).zipWithIndex
    -      .map { case (attr, i) => attr.withName(s"_$i") })
    -    assert(schemaOut.equals(outputRowIterator.schema),
    -      s"Invalid schema from pandas_udf: expected $schemaOut, got 
${outputRowIterator.schema}")
    +    if (outputIterator.nonEmpty) {
    +      val schemaOut = 
StructType.fromAttributes(output.drop(child.output.length).zipWithIndex
    +        .map { case (attr, i) => attr.withName(s"_$i") })
    +      assert(schemaOut.equals(outputRowIterator.schema),
    +        s"Invalid schema from pandas_udf: expected $schemaOut, got 
${outputRowIterator.schema}")
    --- End diff --
    
    Yeah, I tried to make one but since we are now casting the return Series in 
`ArrowPandasSerializer.dumps` with `astype` I have not found a case that 
triggers it. I think it would still be good to keep this, just in case there is 
some way it could happen and if we upgrade to Arrow 0.7 then we won't need the 
`astype` logic and this will be used instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to