Re: [PR] [SPARK-46812][SQL][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

via GitHub Tue, 23 Jan 2024 19:38:13 -0800


wbo4958 commented on PR #44852:
URL: https://github.com/apache/spark/pull/44852#issuecomment-1907300220


   I was trying to add unit tests to check if the ResourceProfile is correctly 
applied to the underlying RDD generated in MapInPandasExec, here is my  testing 
code,
   
   ``` python
           df1 = df.mapInPandas(lambda iter: iter, "id long")
           assert df1.rdd.getResourceProfile() is None
   
           treqs = TaskResourceRequests().cpus(2)
           expected_rp = ResourceProfileBuilder().require(treqs).build
   
           df2 = df.mapInPandas(lambda iter: iter, "id long", False, 
expected_rp)
           assert df2.rdd.getResourceProfile() is not None
   ```
   
   But the ResourceProfile got from `df2.rdd.getResourceProfile()` is None, the 
reason for it is `df2.rdd` will add some other extra MapPartitionRDDs that 
don't have ResourceProfile attached. 
   
   I also tried to use JVM RDD to get the correct parent RDD with the below 
code,
   
   ``` python
   df2.rdd._jrdd.firstParent()
   ```
   
   or
   
   ``` python
   df2.rdd._jrdd.parent(0)
   ```
   
   But both of them didn't work, with below error messages,
   
   ``` console
   py4j.protocol.Py4JError: An error occurred while calling o45.parent. Trace:
   py4j.Py4JException: Method parent([class java.lang.Integer]) does not exist
   
   y4j.protocol.Py4JError: An error occurred while calling o45.firstParent. 
Trace:
   py4j.Py4JException: Method firstParent([]) does not exist
   ```
   
   I don't know how to add unit tests for this PR, but I will perform the 
manual tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46812][SQL][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

Reply via email to