Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19325#discussion_r140852903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -51,10 +51,12 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], output: Seq[Attribute], chi outputIterator.map(new ArrowPayload(_)), context) // Verify that the output schema is correct - val schemaOut = StructType.fromAttributes(output.drop(child.output.length).zipWithIndex - .map { case (attr, i) => attr.withName(s"_$i") }) - assert(schemaOut.equals(outputRowIterator.schema), - s"Invalid schema from pandas_udf: expected $schemaOut, got ${outputRowIterator.schema}") + if (outputIterator.nonEmpty) { + val schemaOut = StructType.fromAttributes(output.drop(child.output.length).zipWithIndex + .map { case (attr, i) => attr.withName(s"_$i") }) + assert(schemaOut.equals(outputRowIterator.schema), + s"Invalid schema from pandas_udf: expected $schemaOut, got ${outputRowIterator.schema}") --- End diff -- Yeah, I tried to make one but since we are now casting the return Series in `ArrowPandasSerializer.dumps` with `astype` I have not found a case that triggers it. I think it would still be good to keep this, just in case there is some way it could happen and if we upgrade to Arrow 0.7 then we won't need the `astype` logic and this will be used instead.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org