Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 @gatorsmile it seemed like a straightforward bug to me. Rows with extra values lead to incorrect output and exceptions when used in `DataFrames`, so it did not seem like there was any possible this would break existing code. For example ``` In [1]: MyRow = Row('a','b') In [2]: print(MyRow(1,2,3)) Row(a=1, b=2) In [3]: spark.createDataFrame([MyRow(1,2,3)]) Out[3]: DataFrame[a: bigint, b: bigint] In [4]: spark.createDataFrame([MyRow(1,2,3)]).show() 18/09/08 21:55:48 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) java.lang.IllegalStateException: Input row doesn't have expected number of values required by the schema. 2 fields are required while 3 values are provided. In [5]: spark.createDataFrame([MyRow(1,2,3)], schema="x: int, y: int").show() ValueError: Length of object (3) does not match with length of fields (2) ``` Maybe I was too hasty with backporting and this needed some discussion. Do you know of a use case that this change would break?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org