[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

BryanCutler Sat, 08 Sep 2018 22:21:01 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/22140
  
    @gatorsmile it seemed like a straightforward bug to me. Rows with extra 
values lead to incorrect output and exceptions when used in `DataFrames`, so it 
did not seem like there was any possible this would break existing code. For 
example
    
    ```
    In [1]: MyRow = Row('a','b')
    
    In [2]: print(MyRow(1,2,3))
    Row(a=1, b=2)
    
    In [3]: spark.createDataFrame([MyRow(1,2,3)])
    Out[3]: DataFrame[a: bigint, b: bigint]
    
    In [4]: spark.createDataFrame([MyRow(1,2,3)]).show()
    18/09/08 21:55:48 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
    java.lang.IllegalStateException: Input row doesn't have expected number of 
values required by the schema. 2 fields are required while 3 values are 
provided.
    
    In [5]: spark.createDataFrame([MyRow(1,2,3)], schema="x: int, y: 
int").show()
    
    ValueError: Length of object (3) does not match with length of fields (2)
    ```
    Maybe I was too hasty with backporting and this needed some discussion. Do 
you know of a use case that this change would break?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

Reply via email to