Haejoon Lee created SPARK-43282: ----------------------------------- Summary: Investigate DataFrame.sort_values with pandas behavior. Key: SPARK-43282 URL: https://issues.apache.org/jira/browse/SPARK-43282 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee
{code:java} import pandas as pd pdf = pd.DataFrame( { "a": pd.Categorical([1, 2, 3, 1, 2, 3]), "b": pd.Categorical( ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"] ), }, ) pdf.groupby("a").apply(lambda x: x).sort_values(["a"]) Traceback (most recent call last): ... ValueError: 'a' is both an index level and a column label, which is ambiguous. {code} We should investigate this issue whether this is intended behavior or just bug in pandas. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org