[jira] [Created] (SPARK-37799) Support of 'melt' function in spark

2022-01-01 Thread Daniel Davies (Jira)
Daniel Davies created SPARK-37799:
-

 Summary: Support of 'melt' function in spark
 Key: SPARK-37799
 URL: https://issues.apache.org/jira/browse/SPARK-37799
 Project: Spark
  Issue Type: Question
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Daniel Davies


Hello,

Un-pivoting a dataframe is currently supported in Pandas with the 'melt' 
function, but isn't available in spark. It's easy enough to produce this 
functionality from the functions module (e.g. such as the melt function in 
pandas-on-pyspark 
[here|https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651]),
 but I was wondering whether a more native solution had been considered? It 
would make end-user code more lightweight at the very least; and I wonder 
whether it could be made more efficient than using the stack 
function/struct-array-explode functions.

I'm happy to try and make a PR if this is something that might be useful within 
spark. No worries if not, the methods above work fine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Simeon Simeonov (Jira)
Simeon Simeonov created SPARK-37800:
---

 Summary: TreeNode.argString incorrectly formats arguments of type 
Set[_]
 Key: SPARK-37800
 URL: https://issues.apache.org/jira/browse/SPARK-37800
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Simeon Simeonov


The implementation of {{argString}} uses the following pattern for sets:

 
{code:java}
case set: Set[_] =>
  // Sort elements for deterministic behaviours
  val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted) 
 
  truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
Instead of sorting the elements of the set, the implementation sorts the 
characters of the strings that {{formatArg}} returns. 

The fix is simply to move the closing parenthesis to the correct location:
{code:java}
  val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37799) Support of 'melt' function in spark

2022-01-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467523#comment-17467523
 ] 

Hyukjin Kwon commented on SPARK-37799:
--

[~ddavies1], sounds a valid question but let's interact with dev (or use) 
mailing list first before filing it as a JIRA ticket here. I think mailing 
lists are better places to collect feedbacks and investigate the needs

> Support of 'melt' function in spark
> ---
>
> Key: SPARK-37799
> URL: https://issues.apache.org/jira/browse/SPARK-37799
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Daniel Davies
>Priority: Minor
>
> Hello,
> Un-pivoting a dataframe is currently supported in Pandas with the 'melt' 
> function, but isn't available in spark. It's easy enough to produce this 
> functionality from the functions module (e.g. such as the melt function in 
> pandas-on-pyspark 
> [here|https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651]),
>  but I was wondering whether a more native solution had been considered? It 
> would make end-user code more lightweight at the very least; and I wonder 
> whether it could be made more efficient than using the stack 
> function/struct-array-explode functions.
> I'm happy to try and make a PR if this is something that might be useful 
> within spark. No worries if not, the methods above work fine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37799) Support of 'melt' function in spark

2022-01-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37799.
--
Resolution: Invalid

> Support of 'melt' function in spark
> ---
>
> Key: SPARK-37799
> URL: https://issues.apache.org/jira/browse/SPARK-37799
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Daniel Davies
>Priority: Minor
>
> Hello,
> Un-pivoting a dataframe is currently supported in Pandas with the 'melt' 
> function, but isn't available in spark. It's easy enough to produce this 
> functionality from the functions module (e.g. such as the melt function in 
> pandas-on-pyspark 
> [here|https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651]),
>  but I was wondering whether a more native solution had been considered? It 
> would make end-user code more lightweight at the very least; and I wonder 
> whether it could be made more efficient than using the stack 
> function/struct-array-explode functions.
> I'm happy to try and make a PR if this is something that might be useful 
> within spark. No worries if not, the methods above work fine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37772) [PYSPARK] Publish ApacheSparkGitHubActionImage arm64 docker image

2022-01-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467524#comment-17467524
 ] 

Hyukjin Kwon edited comment on SPARK-37772 at 1/2/22, 12:45 AM:


Actually we have been working on publishing docker images for releases, and 
Holden has been working on that.

cc [~holden], [~dongjoon] [~gengliang] FYI


was (Author: hyukjin.kwon):
cc [~holden] FYI.

> [PYSPARK] Publish ApacheSparkGitHubActionImage arm64 docker image
> -
>
> Key: SPARK-37772
> URL: https://issues.apache.org/jira/browse/SPARK-37772
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37772) [PYSPARK] Publish ApacheSparkGitHubActionImage arm64 docker image

2022-01-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467524#comment-17467524
 ] 

Hyukjin Kwon commented on SPARK-37772:
--

cc [~holden] FYI.

> [PYSPARK] Publish ApacheSparkGitHubActionImage arm64 docker image
> -
>
> Key: SPARK-37772
> URL: https://issues.apache.org/jira/browse/SPARK-37772
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467525#comment-17467525
 ] 

Hyukjin Kwon commented on SPARK-37800:
--

Yeah, from a cursory look, what you said here sounds sane. interested in 
creating a PR?

> TreeNode.argString incorrectly formats arguments of type Set[_]
> ---
>
> Key: SPARK-37800
> URL: https://issues.apache.org/jira/browse/SPARK-37800
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Simeon Simeonov
>Priority: Minor
>
> The implementation of {{argString}} uses the following pattern for sets:
>  
> {code:java}
> case set: Set[_] =>
>   // Sort elements for deterministic behaviours
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted)   
>
>   truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
> Instead of sorting the elements of the set, the implementation sorts the 
> characters of the strings that {{formatArg}} returns. 
> The fix is simply to move the closing parenthesis to the correct location:
> {code:java}
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37800:
-
Component/s: SQL
 (was: Spark Core)

> TreeNode.argString incorrectly formats arguments of type Set[_]
> ---
>
> Key: SPARK-37800
> URL: https://issues.apache.org/jira/browse/SPARK-37800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Simeon Simeonov
>Priority: Minor
>
> The implementation of {{argString}} uses the following pattern for sets:
>  
> {code:java}
> case set: Set[_] =>
>   // Sort elements for deterministic behaviours
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted)   
>
>   truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
> Instead of sorting the elements of the set, the implementation sorts the 
> characters of the strings that {{formatArg}} returns. 
> The fix is simply to move the closing parenthesis to the correct location:
> {code:java}
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Simeon Simeonov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467526#comment-17467526
 ] 

Simeon Simeonov commented on SPARK-37800:
-

[~hyukjin.kwon] Done: https://github.com/apache/spark/pull/35084

> TreeNode.argString incorrectly formats arguments of type Set[_]
> ---
>
> Key: SPARK-37800
> URL: https://issues.apache.org/jira/browse/SPARK-37800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Simeon Simeonov
>Priority: Minor
>
> The implementation of {{argString}} uses the following pattern for sets:
>  
> {code:java}
> case set: Set[_] =>
>   // Sort elements for deterministic behaviours
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted)   
>
>   truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
> Instead of sorting the elements of the set, the implementation sorts the 
> characters of the strings that {{formatArg}} returns. 
> The fix is simply to move the closing parenthesis to the correct location:
> {code:java}
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467529#comment-17467529
 ] 

Apache Spark commented on SPARK-37800:
--

User 'ssimeonov' has created a pull request for this issue:
https://github.com/apache/spark/pull/35084

> TreeNode.argString incorrectly formats arguments of type Set[_]
> ---
>
> Key: SPARK-37800
> URL: https://issues.apache.org/jira/browse/SPARK-37800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Simeon Simeonov
>Priority: Minor
>
> The implementation of {{argString}} uses the following pattern for sets:
>  
> {code:java}
> case set: Set[_] =>
>   // Sort elements for deterministic behaviours
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted)   
>
>   truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
> Instead of sorting the elements of the set, the implementation sorts the 
> characters of the strings that {{formatArg}} returns. 
> The fix is simply to move the closing parenthesis to the correct location:
> {code:java}
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37800:


Assignee: (was: Apache Spark)

> TreeNode.argString incorrectly formats arguments of type Set[_]
> ---
>
> Key: SPARK-37800
> URL: https://issues.apache.org/jira/browse/SPARK-37800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Simeon Simeonov
>Priority: Minor
>
> The implementation of {{argString}} uses the following pattern for sets:
>  
> {code:java}
> case set: Set[_] =>
>   // Sort elements for deterministic behaviours
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted)   
>
>   truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
> Instead of sorting the elements of the set, the implementation sorts the 
> characters of the strings that {{formatArg}} returns. 
> The fix is simply to move the closing parenthesis to the correct location:
> {code:java}
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37800) TreeNode.argString incorrectly formats arguments of type Set[_]

2022-01-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37800:


Assignee: Apache Spark

> TreeNode.argString incorrectly formats arguments of type Set[_]
> ---
>
> Key: SPARK-37800
> URL: https://issues.apache.org/jira/browse/SPARK-37800
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Simeon Simeonov
>Assignee: Apache Spark
>Priority: Minor
>
> The implementation of {{argString}} uses the following pattern for sets:
>  
> {code:java}
> case set: Set[_] =>
>   // Sort elements for deterministic behaviours
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields).sorted)   
>
>   truncatedString(sortedSeq, "{", ", ", "}", maxFields) :: Nil {code}
> Instead of sorting the elements of the set, the implementation sorts the 
> characters of the strings that {{formatArg}} returns. 
> The fix is simply to move the closing parenthesis to the correct location:
> {code:java}
>   val sortedSeq = set.toSeq.map(formatArg(_, maxFields)).sorted
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org