wbo4958 commented on PR #44852: URL: https://github.com/apache/spark/pull/44852#issuecomment-1907300220
I was trying to add unit tests to check if the ResourceProfile is correctly applied to the underlying RDD generated in MapInPandasExec, here is my testing code, ``` python df1 = df.mapInPandas(lambda iter: iter, "id long") assert df1.rdd.getResourceProfile() is None treqs = TaskResourceRequests().cpus(2) expected_rp = ResourceProfileBuilder().require(treqs).build df2 = df.mapInPandas(lambda iter: iter, "id long", False, expected_rp) assert df2.rdd.getResourceProfile() is not None ``` But the ResourceProfile got from `df2.rdd.getResourceProfile()` is None, the reason for it is `df2.rdd` will add some other extra MapPartitionRDDs that don't have ResourceProfile attached. I also tried to use JVM RDD to get the correct parent RDD with the below code, ``` python df2.rdd._jrdd.firstParent() ``` or ``` python df2.rdd._jrdd.parent(0) ``` But both of them didn't work, with below error messages, ``` console py4j.protocol.Py4JError: An error occurred while calling o45.parent. Trace: py4j.Py4JException: Method parent([class java.lang.Integer]) does not exist y4j.protocol.Py4JError: An error occurred while calling o45.firstParent. Trace: py4j.Py4JException: Method firstParent([]) does not exist ``` I don't know how to add unit tests for this PR, but I will perform the manual tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org