zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1012742215
########## python/pyspark/sql/connect/client.py: ########## @@ -251,6 +263,13 @@ def _execute_and_fetch(self, req: pb2.Request) -> typing.Optional[pandas.DataFra if len(result_dfs) > 0: df = pd.concat(result_dfs) + + # pd.concat generates non-consecutive index like: + # Int64Index([0, 1, 0, 1, 2, 0, 1, 0, 1, 2], dtype='int64') + # set it to RangeIndex to be consistent with pyspark + n = len(df) + df = df.set_index(pd.RangeIndex(start=0, stop=n, step=1)) Review Comment: make this change , otherwise some tests will fail those tests only generate single json batch, so works with json ########## python/pyspark/sql/connect/client.py: ########## @@ -251,6 +263,13 @@ def _execute_and_fetch(self, req: pb2.Request) -> typing.Optional[pandas.DataFra if len(result_dfs) > 0: df = pd.concat(result_dfs) + + # pd.concat generates non-consecutive index like: + # Int64Index([0, 1, 0, 1, 2, 0, 1, 0, 1, 2], dtype='int64') + # set it to RangeIndex to be consistent with pyspark + n = len(df) + df = df.set_index(pd.RangeIndex(start=0, stop=n, step=1)) Review Comment: make this change , otherwise some tests will fail those tests only generate single json batch, so worked with json -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org