[ https://issues.apache.org/jira/browse/SPARK-37723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468096#comment-17468096 ]
Kanthi Subramanian commented on SPARK-37723: -------------------------------------------- Hi [~itholic] , is someone working on this, I can take on this, some direction would be extremely helpful. Thanks. > Support tuple for non-MultiIndex column name. > --------------------------------------------- > > Key: SPARK-37723 > URL: https://issues.apache.org/jira/browse/SPARK-37723 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Haejoon Lee > Priority: Major > > pandas API on Spark doesn't support tuple as column name for non-MultiIndex > column. > {code:java} > >>> psdf = ps.DataFrame({"A": [1, 2, 3]}) > >>> psdf > A > 0 1 > 1 2 > 2 3 > >>> psdf[('a', 'b')] = [4, 5, 6] > Traceback (most recent call last): > ... > KeyError: 'Key length (2) exceeds index depth (1)' > {code} > As pandas support this, we should follow the behavior. > {code:java} > >>> pdf = pd.DataFrame({"A": [1, 2, 3]}) > >>> pdf[('a', 'b')] = [4, 5, 6] > >>> pdf > A (a, b) > 0 1 4 > 1 2 5 > 2 3 6{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org