[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

BryanCutler Wed, 18 Apr 2018 16:13:18 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20280
  
    Also, this will cause a breaking change if `Row`s are defined with kwargs 
and schema changes field names, like this:
    
    ```
    data = [Row(key=i, value=str(i)) for i in range(100)]
    rdd = self.sc.parallelize(data, 5)
    df = rdd.toDF(" a: int, b: string ")
    ```
    
    and this would work but might be slower, depending on how complicated the 
schema is, because now the field names are searched for instead of just going 
by position
    ```
    df = rdd.toDF(" key: int, value: string ")
    ```
    
    So if we go forward with this fix, I should probably add something in the 
migration guide



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

Reply via email to