[ 
https://issues.apache.org/jira/browse/SPARK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687034#comment-16687034
 ] 

Ramandeep Singh commented on SPARK-25982:
-----------------------------------------

Sure,

a) The setting for scheduler is fair scheduler

--conf 'spark.scheduler.mode'='FAIR'

b) There are independent jobs at one stage that are scheduled. This is okay, 
all of them block on dataframe write to complete. 

```

val futures = steps.par.map(stepId => Future {
 processWrite(stepsMap(stepId))
}).par
futures.foreach(Await.result(_, Duration.create(timeout, TimeUnit.MINUTES)))

```

Here, the processWrite processes write operations in parallel and awaits on 
each of them to complete, but the persist or write operation returns before it 
has written all the partitions of the dataframes, so other jobs at a later 
stage end up being run.

 

> Dataframe write is non blocking in fair scheduling mode
> -------------------------------------------------------
>
>                 Key: SPARK-25982
>                 URL: https://issues.apache.org/jira/browse/SPARK-25982
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Ramandeep Singh
>            Priority: Major
>
> Hi,
> I have noticed that expected behavior of dataframe write operation to block 
> is not working in fair scheduling mode.
> Ideally when a dataframe write is occurring and a future is blocking on 
> AwaitResult, no other job should be started, but this is not the case. I have 
> noticed that other jobs are started when the partitions are being written.  
>  
> Regards,
> Ramandeep Singh
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to