[ https://issues.apache.org/jira/browse/SPARK-36348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yikun Jiang updated SPARK-36348: -------------------------------- Description: {code:python} pidx = pd.Index([10, 20, 15, 30, 45, None], name="x") psidx = ps.Index(pidx) self.assert_eq(psidx.astype(str), pidx.astype(str)) {code} [left pandas on spark]: Index(['10.0', '20.0', '15.0', '30.0', '45.0', 'nan'], dtype='object', name='x') [right pandas]: Index(['10', '20', '15', '30', '45', 'None'], dtype='object', name='x') The index is loaded as float64, so the follow step like astype would be diff with pandas was: {code:python} pidx = pd.Index([10, 20, 15, 30, 45, None], name="x") psidx = ps.Index(pidx) self.assert_eq(psidx.astype(str), pidx.astype(str)) {code} [left pandas on spark]: Index(['10.0', '20.0', '15.0', '30.0', '45.0', 'nan'], dtype='object', name='x') [right pandas]: Index(['10', '20', '15', '30', '45', 'None'], dtype='object', name='x') The index is loaded as float64 > unexpected Index loaded: pd.Index([10, 20, None], name="x") > ----------------------------------------------------------- > > Key: SPARK-36348 > URL: https://issues.apache.org/jira/browse/SPARK-36348 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.0 > Reporter: Yikun Jiang > Priority: Major > > {code:python} > pidx = pd.Index([10, 20, 15, 30, 45, None], name="x") > psidx = ps.Index(pidx) > self.assert_eq(psidx.astype(str), pidx.astype(str)) > {code} > [left pandas on spark]: Index(['10.0', '20.0', '15.0', '30.0', '45.0', > 'nan'], dtype='object', name='x') > [right pandas]: Index(['10', '20', '15', '30', '45', 'None'], dtype='object', > name='x') > The index is loaded as float64, so the follow step like astype would be diff > with pandas -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org