[jira] [Updated] (SPARK-25333) Ability to add new columns in Dataset in a user-defined position

Walid Mellouli (JIRA) Tue, 04 Sep 2018 13:58:12 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Walid Mellouli updated SPARK-25333:
-----------------------------------
    Description: 
When we add new columns in a Dataset, they are added automatically at the end 
of the Dataset.

Consider this data frame:
{code:java}
val df = sc.parallelize(Seq(1, 2, 3)).toDF
df.printSchema


root
 |-- value: integer (nullable = true)
{code}
When we add a new column:
{code:java}
val newDf = df.withColumn("newColumn", col("value") + 1)
newDf.printSchema


root
 |-- value: integer (nullable = true)
 |-- newColumn: integer (nullable = true)
{code}
Generally users want to add new columns either at the end, in the beginning or 
in a defined position, depends on use cases.
 In my case for example, we add technical columns in the beginning of a Dataset 
and we add business columns at the end.

  was:
When we add new columns in a Dataset, they are added automatically at the end 
of the Dataset.
{code:java}
val df = sc.parallelize(Seq(1, 2, 3)).toDF
df.printSchema


root
 |-- value: integer (nullable = true)
{code}

When we add a new column:

{code:java}
val newDf = df.withColumn("newColumn", col("value") + 1)
newDf.printSchema


root
 |-- value: integer (nullable = true)
 |-- newColumn: integer (nullable = true)
{code}

Generally users want to add new columns either at the end or in the beginning, 
depends on use cases.
 In my case for example, we add technical columns in the beginning of a Dataset 
and we add business columns at the end.

        Summary: Ability to add new columns in Dataset in a user-defined 
position  (was: Ability to add new columns in the beginning of a Dataset)

> Ability to add new columns in Dataset in a user-defined position
> ----------------------------------------------------------------
>
>                 Key: SPARK-25333
>                 URL: https://issues.apache.org/jira/browse/SPARK-25333
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Walid Mellouli
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When we add new columns in a Dataset, they are added automatically at the end 
> of the Dataset.
> Consider this data frame:
> {code:java}
> val df = sc.parallelize(Seq(1, 2, 3)).toDF
> df.printSchema
> root
>  |-- value: integer (nullable = true)
> {code}
> When we add a new column:
> {code:java}
> val newDf = df.withColumn("newColumn", col("value") + 1)
> newDf.printSchema
> root
>  |-- value: integer (nullable = true)
>  |-- newColumn: integer (nullable = true)
> {code}
> Generally users want to add new columns either at the end, in the beginning 
> or in a defined position, depends on use cases.
>  In my case for example, we add technical columns in the beginning of a 
> Dataset and we add business columns at the end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25333) Ability to add new columns in Dataset in a user-defined position

Reply via email to