Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-07 Thread via GitHub


HyukjinKwon closed pull request #46391: [SPARK-48045][PYTHON] Pandas API 
groupby with multi-agg-relabel ignores as_index=False
URL: https://github.com/apache/spark/pull/46391


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-07 Thread via GitHub


HyukjinKwon commented on PR #46391:
URL: https://github.com/apache/spark/pull/46391#issuecomment-2099529188

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-07 Thread via GitHub


itholic commented on code in PR #46391:
URL: https://github.com/apache/spark/pull/46391#discussion_r1593236433


##
python/pyspark/pandas/groupby.py:
##
@@ -308,6 +308,7 @@ def aggregate(
 )
 
 if not self._as_index:
+index_cols = list(psdf.columns)

Review Comment:
   Sounds good. Thanks for addressing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-07 Thread via GitHub


sinaiamonkar-sai commented on code in PR #46391:
URL: https://github.com/apache/spark/pull/46391#discussion_r1592518229


##
python/pyspark/pandas/groupby.py:
##
@@ -322,8 +323,12 @@ def aggregate(
 psdf = psdf.reset_index(level=should_drop_index, drop=drop)
 if len(should_drop_index) < len(self._groupkeys):
 psdf = psdf.reset_index()
+index_cols = [c for c in list(psdf.columns) if c not in index_cols]

Review Comment:
   Updated this is well with psdf._internal.column_labels.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-07 Thread via GitHub


sinaiamonkar-sai commented on code in PR #46391:
URL: https://github.com/apache/spark/pull/46391#discussion_r1592517040


##
python/pyspark/pandas/groupby.py:
##
@@ -308,6 +308,7 @@ def aggregate(
 )
 
 if not self._as_index:
+index_cols = list(psdf.columns)

Review Comment:
   Hello, @itholic. Thank you for the inputs!
   I have updated the code but used psdf._internal.column_labels instead as it 
gave the desired list of tuples. I hope that is fine. Kindly, check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-07 Thread via GitHub


itholic commented on code in PR #46391:
URL: https://github.com/apache/spark/pull/46391#discussion_r1591930032


##
python/pyspark/pandas/groupby.py:
##
@@ -308,6 +308,7 @@ def aggregate(
 )
 
 if not self._as_index:
+index_cols = list(psdf.columns)

Review Comment:
   Recommend to use `psdf._internal.data_spark_column_names` here



##
python/pyspark/pandas/groupby.py:
##
@@ -322,8 +323,12 @@ def aggregate(
 psdf = psdf.reset_index(level=should_drop_index, drop=drop)
 if len(should_drop_index) < len(self._groupkeys):
 psdf = psdf.reset_index()
+index_cols = [c for c in list(psdf.columns) if c not in index_cols]

Review Comment:
   ditto. `index_cols = [c for c in psdf._internal.data_spark_column_names if c 
not in index_cols]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub


sinaiamonkar-sai commented on PR #46391:
URL: https://github.com/apache/spark/pull/46391#issuecomment-2095018053

   Thank you @dongjoon-hyun! Sure, let me add that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub


HyukjinKwon commented on PR #46391:
URL: https://github.com/apache/spark/pull/46391#issuecomment-2095016743

   cc @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub


sinaiamonkar-sai commented on PR #46391:
URL: https://github.com/apache/spark/pull/46391#issuecomment-2094885603

   Hello, @holdenk ! This is my first Spark PR. Can you please review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org