wbo4958 commented on PR #45232: URL: https://github.com/apache/spark/pull/45232#issuecomment-1963263080
## With dynamic allocation enabled. ``` bash start-connect-server.sh --master spark://192.168.0.106:7077 \ --jars jars/spark-connect_2.13-4.0.0-SNAPSHOT.jar \ --conf spark.executor.cores=4 \ --conf spark.task.cpus=1 \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.maxExecutors=1 \ ``` The above command enables the dynamic allocation and the max executors required is set to 1 in order to test. And then launch the spark connect pyspark client by ``` bash pyspark --remote "sc://localhost" ``` ### TaskResourceProfile without any specific executor request information Test code, ```python from pyspark.resource import ExecutorResourceRequests, TaskResourceRequests, ResourceProfileBuilder def filter_func(iterator): for pdf in iterator: yield pdf df = spark.range(0, 100, 1, 4) treqs = TaskResourceRequests().cpus(3) rp = ResourceProfileBuilder().require(treqs).build df.repartition(3).mapInArrow(lambda iter: iter, df.schema, False, rp).collect() ``` The rp refers to the TaskResourceProfile without any specific executor request information, thus the executor information will utilize the default values from Default ResourceProfile (executor.cores=4). The above code will require an extra executor which will have the same `executor.cores/memory` as the default ResourceProfile. ![0](https://github.com/apache/spark/assets/1320706/abd5a1ad-8564-490f-abc5-a45621496040) ![1](https://github.com/apache/spark/assets/1320706/340954c5-0dd8-42f3-9368-5dccdd3d7b62) ![2](https://github.com/apache/spark/assets/1320706/7280b9da-6346-4f81-853a-81e37aaecb9d) ### Different executor request information ```python from pyspark.resource import ExecutorResourceRequests, TaskResourceRequests, ResourceProfileBuilder def filter_func(iterator): for pdf in iterator: yield pdf df = spark.range(0, 100, 1, 4) ereqs = ExecutorResourceRequests().cores(6) treqs = TaskResourceRequests().cpus(5) rp = ResourceProfileBuilder().require(treqs).require(ereqs).build df.repartition(3).mapInArrow(lambda iter: iter, df.schema, False, rp).collect() ``` ![5](https://github.com/apache/spark/assets/1320706/eb104194-c156-4319-ac48-f1d199304501) ![6](https://github.com/apache/spark/assets/1320706/a514f1a7-4ffd-497d-897f-6b5f786ea86b) ![7](https://github.com/apache/spark/assets/1320706/7ade7aec-992e-44ed-ae3f-70b25c208243) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org