[ https://issues.apache.org/jira/browse/SPARK-36707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-36707: --------------------------------- Labels: release-notes (was: ) > Support to specify index type and name in pandas API on Spark > ------------------------------------------------------------- > > Key: SPARK-36707 > URL: https://issues.apache.org/jira/browse/SPARK-36707 > Project: Spark > Issue Type: Umbrella > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > Labels: release-notes > Fix For: 3.3.0 > > > See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html. > pandas API on Spark currently there's no way to specify the index type and > name in the output when you apply an arbitrary function, which forces to > create the default index: > {code} > >>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]: > ... pdf['A'] = pdf.id + 1 > ... return pdf > ... > >>> ps.range(5).koalas.apply_batch(transform) > {code} > {code} > id A > 0 0 1 > 1 1 2 > 2 2 3 > 3 3 4 > 4 4 5 > {code} > We should have a way to specify the index. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org