Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

via GitHub Sun, 25 Feb 2024 19:39:30 -0800


wbo4958 commented on PR #45232:
URL: https://github.com/apache/spark/pull/45232#issuecomment-1963263080


   ## With dynamic allocation enabled.
   
   ``` bash
   start-connect-server.sh --master spark://192.168.0.106:7077 \
      --jars jars/spark-connect_2.13-4.0.0-SNAPSHOT.jar  \
         --conf spark.executor.cores=4 \
         --conf spark.task.cpus=1 \
         --conf spark.dynamicAllocation.enabled=true \
      --conf spark.dynamicAllocation.maxExecutors=1 \
   ```
   
   The above command enables the dynamic allocation and the max executors 
required is set to 1 in order to test. And then launch the spark connect 
pyspark client by
   
   ``` bash
   pyspark --remote "sc://localhost"
   ```
   
   ### TaskResourceProfile without any specific executor request information
   
   Test code,
   
   ```python
   from pyspark.resource import ExecutorResourceRequests, TaskResourceRequests, 
ResourceProfileBuilder
   
   def filter_func(iterator):
       for pdf in iterator:
           yield pdf
   
   df = spark.range(0, 100, 1, 4)
   
   treqs = TaskResourceRequests().cpus(3)
   rp = ResourceProfileBuilder().require(treqs).build
   df.repartition(3).mapInArrow(lambda iter: iter, df.schema, False, 
rp).collect()
   ```
   
   The rp refers to the TaskResourceProfile without any specific executor 
request information, thus the executor information will utilize the default 
values from Default ResourceProfile (executor.cores=4).
   
   The above code will require an extra executor which will have the same 
`executor.cores/memory` as the default ResourceProfile.
   
   
![0](https://github.com/apache/spark/assets/1320706/abd5a1ad-8564-490f-abc5-a45621496040)
   
   
![1](https://github.com/apache/spark/assets/1320706/340954c5-0dd8-42f3-9368-5dccdd3d7b62)
   
   
![2](https://github.com/apache/spark/assets/1320706/7280b9da-6346-4f81-853a-81e37aaecb9d)
   
   
   
   ### Different executor request information 
   
   ```python
   from pyspark.resource import ExecutorResourceRequests, TaskResourceRequests, 
ResourceProfileBuilder
   
   def filter_func(iterator):
       for pdf in iterator:
           yield pdf
   
   df = spark.range(0, 100, 1, 4)
   
   ereqs = ExecutorResourceRequests().cores(6)
   treqs = TaskResourceRequests().cpus(5)
   rp = ResourceProfileBuilder().require(treqs).require(ereqs).build
   df.repartition(3).mapInArrow(lambda iter: iter, df.schema, False, 
rp).collect()
   ```
   
   
![5](https://github.com/apache/spark/assets/1320706/eb104194-c156-4319-ac48-f1d199304501)
   
   
![6](https://github.com/apache/spark/assets/1320706/a514f1a7-4ffd-497d-897f-6b5f786ea86b)
   
   
![7](https://github.com/apache/spark/assets/1320706/7ade7aec-992e-44ed-ae3f-70b25c208243)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

Reply via email to