[ https://issues.apache.org/jira/browse/SPARK-32098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-32098: ------------------------------------ Assignee: Apache Spark > Use iloc for positional slicing instead of direct slicing in createDataFrame > with Arrow > --------------------------------------------------------------------------------------- > > Key: SPARK-32098 > URL: https://issues.apache.org/jira/browse/SPARK-32098 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.4.6, 3.0.0 > Reporter: Hyukjin Kwon > Assignee: Apache Spark > Priority: Critical > Labels: correctness > > When you use floats are index of pandas, it produces a wrong results: > {code} > >>> import pandas as pd > >>> spark.createDataFrame(pd.DataFrame({'a': [1,2,3]}, index=[2., 3., > >>> 4.])).show() > +---+ > | a| > +---+ > | 1| > | 1| > | 2| > +---+ > {code} > This is because direct slicing uses the value as index when the index > contains floats: > {code} > >>> pd.DataFrame({'a': [1,2,3]}, index=[2., 3., 4.])[2:] > a > 2.0 1 > 3.0 2 > 4.0 3 > >>> pd.DataFrame({'a': [1,2,3]}, index=[2., 3., 4.]).iloc[2:] > a > 4.0 3 > >>> pd.DataFrame({'a': [1,2,3]}, index=[2, 3, 4])[2:] > a > 4 3 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org