[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-20 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462718#comment-17462718
 ] 

Nicholas Chammas commented on SPARK-24853:
--

I would expect something like that to yield an {{{}AnalysisException{}}}. Would 
that address your concern, or are you suggesting that it might be difficult to 
catch that sort of problem cleanly?

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 3.2.0
>Reporter: nirav patel
>Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-14 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459607#comment-17459607
 ] 

Hyukjin Kwon commented on SPARK-24853:
--

Oh I meant that it will allow something like:

{{withColumnRenamed(from_json(col("a"), col("another_col")))}} where 
{{from_json(col("a")}} is not actually an existing column.

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 3.2.0
>Reporter: nirav patel
>Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-14 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459601#comment-17459601
 ] 

Nicholas Chammas commented on SPARK-24853:
--

Assuming we are talking about the example I provided: Yes, {{col("count")}} 
would still be ambiguous.

I don't know if Spark would know to catch that problem. But note that the 
current behavior of {{.withColumnRenamed('count', ...)}} renames all columns 
named "count", which is just incorrect.

So allowing {{col("count")}} will either be just as incorrect as the current 
behavior, or it will be an improvement in that Spark may complain that the 
column reference is ambiguous. I'd have to try it to confirm the behavior.

Of course, the main improvement offered by {{Column}} references is that users 
can do something like {{.withColumnRenamed(left_counts['count'], ...)}} and get 
the correct behavior.

I didn't follow what you are getting at regarding {{{}from_json{}}}, but does 
that address your concern?

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 3.2.0
>Reporter: nirav patel
>Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-14 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459579#comment-17459579
 ] 

Hyukjin Kwon commented on SPARK-24853:
--

[~nchammas], actually I don't think this can truly handle the duplicate column 
names ...  allowing column will expose a possibility such as col("count") or 
other expressions such as from_json, etc.

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 3.2.0
>Reporter: nirav patel
>Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-14 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459467#comment-17459467
 ] 

Nicholas Chammas commented on SPARK-24853:
--

[~hyukjin.kwon] - Are you still opposed to this proposed improvement? If not, 
I'd like to work on it.

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 3.2.0
>Reporter: nirav patel
>Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437396#comment-17437396
 ] 

Nicholas Chammas commented on SPARK-24853:
--

The [contributing guide|https://spark.apache.org/contributing.html] isn't clear 
on how to populate "Affects Version" for improvements, so I've just tagged the 
latest release.

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 3.2.0
>Reporter: nirav patel
>Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437394#comment-17437394
 ] 

Nicholas Chammas commented on SPARK-24853:
--

[~hyukjin.kwon] - It's not just for consistency. As noted in the description, 
this is useful when you are trying to rename a column with an ambiguous name.

For example, imagine two tables {{left}} and {{right}}, each with a column 
called {{count}}:
{code:java}
(
  left_counts.alias('left')
  .join(right_counts.alias('right'), on='join_key')
  .withColumn(
'total_count',
left_counts['count'] + right_counts['count']
  )
  .withColumnRenamed('left.count', 'left_count')  # no-op; alias doesn't work
  .withColumnRenamed('count', 'left_count')  # incorrect; it renames both count 
columns
  .withColumnRenamed(left_counts['count'], 'left_count')  # what, ideally, 
users want to do here
  .show()
){code}
If you don't mind, I'm going to reopen this issue.

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2
>Reporter: nirav patel
>Priority: Major
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2018-07-18 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548659#comment-16548659
 ] 

Hyukjin Kwon commented on SPARK-24853:
--

I don't think we need an API just for consistency.

> Support Column type for withColumn and withColumnRenamed apis
> -
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2
>Reporter: nirav patel
>Priority: Major
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org