[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in the...

wmellouli Tue, 04 Sep 2018 10:05:08 -0700

GitHub user wmellouli opened a pull request:

    https://github.com/apache/spark/pull/22332


    [SPARK-25333][SQL] Ability add new columns in the beginning of Dataset

    ## What changes were proposed in this pull request?
    
    When we add new columns in a Dataset, they are added automatically at the 
end of the Dataset.
    Generally users want to add new columns either at the end or in the 
beginning, depends on use cases.
    In my case for example, we add technical columns in the beginning of a 
Dataset and we add business columns at the end.
    
    This pull request, add the ability to add new columns in the beginning of a 
Dataset, using an optional flag **atTheEnd**: 
    - true (default behavior) means add the column at the end
    - false means add the column in the beginning 
    
    The change brought is backward compatible with old versions, so we can:
    
    1- add a new column without using the flag **atTheEnd** (default behavior):
    
    ```
    val newDf = df.withColumn("newColumn", col("value") + 1)
    newDf.printSchema
    
    root
     |-- value: integer (nullable = true)
     |-- newColumn: integer (nullable = true)
    ```
    
    2- add a new column using the flag **atTheEnd** with **true** value:
    
    ```
    val newDf = df.withColumn("newColumn", col("value") + 1, true)
    newDf.printSchema
    
    root
     |-- value: integer (nullable = true)
     |-- newColumn: integer (nullable = true)
    ```
    
    3- add a new column using the flag **atTheEnd** with **false** value:
    
    ```
    val newDf = df.withColumn("newColumn", col("value") + 1, false)
    newDf.printSchema
    
    root
     |-- newColumn: integer (nullable = true)
     |-- value: integer (nullable = true)
    ```
    
    ## How was this patch tested?
    
    This patch is tested with unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wmellouli/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22332
    
----
commit f83afe5172086993756e750bd2c7e3bb05667f62
Author: Walid MELLOULI <walid_mellouli@...>
Date:   2018-09-04T16:30:32Z

    [SPARK-25333][SQL] Ability to add new columns in the beginning of Dataset

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in the...

Reply via email to