Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
HyukjinKwon closed pull request #46391: [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False URL: https://github.com/apache/spark/pull/46391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
HyukjinKwon commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2099529188 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
itholic commented on code in PR #46391: URL: https://github.com/apache/spark/pull/46391#discussion_r1593236433 ## python/pyspark/pandas/groupby.py: ## @@ -308,6 +308,7 @@ def aggregate( ) if not self._as_index: +index_cols = list(psdf.columns) Review Comment: Sounds good. Thanks for addressing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
sinaiamonkar-sai commented on code in PR #46391: URL: https://github.com/apache/spark/pull/46391#discussion_r1592518229 ## python/pyspark/pandas/groupby.py: ## @@ -322,8 +323,12 @@ def aggregate( psdf = psdf.reset_index(level=should_drop_index, drop=drop) if len(should_drop_index) < len(self._groupkeys): psdf = psdf.reset_index() +index_cols = [c for c in list(psdf.columns) if c not in index_cols] Review Comment: Updated this is well with psdf._internal.column_labels. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
sinaiamonkar-sai commented on code in PR #46391: URL: https://github.com/apache/spark/pull/46391#discussion_r1592517040 ## python/pyspark/pandas/groupby.py: ## @@ -308,6 +308,7 @@ def aggregate( ) if not self._as_index: +index_cols = list(psdf.columns) Review Comment: Hello, @itholic. Thank you for the inputs! I have updated the code but used psdf._internal.column_labels instead as it gave the desired list of tuples. I hope that is fine. Kindly, check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
itholic commented on code in PR #46391: URL: https://github.com/apache/spark/pull/46391#discussion_r1591930032 ## python/pyspark/pandas/groupby.py: ## @@ -308,6 +308,7 @@ def aggregate( ) if not self._as_index: +index_cols = list(psdf.columns) Review Comment: Recommend to use `psdf._internal.data_spark_column_names` here ## python/pyspark/pandas/groupby.py: ## @@ -322,8 +323,12 @@ def aggregate( psdf = psdf.reset_index(level=should_drop_index, drop=drop) if len(should_drop_index) < len(self._groupkeys): psdf = psdf.reset_index() +index_cols = [c for c in list(psdf.columns) if c not in index_cols] Review Comment: ditto. `index_cols = [c for c in psdf._internal.data_spark_column_names if c not in index_cols]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
sinaiamonkar-sai commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2095018053 Thank you @dongjoon-hyun! Sure, let me add that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
HyukjinKwon commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2095016743 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]
sinaiamonkar-sai commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2094885603 Hello, @holdenk ! This is my first Spark PR. Can you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org