Romain Manni-Bucau created SPARK-55536:
------------------------------------------

             Summary: Reusable PVC are not usable
                 Key: SPARK-55536
                 URL: https://issues.apache.org/jira/browse/SPARK-55536
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 4.0.2
            Reporter: Romain Manni-Bucau


Side note: tested on 4.0.1 and 4.0.2, didn't test on 4.1.1 but from main code I 
assume it is affected as well.

Background: 
https://issues.apache.org/jira/browse/SPARK-35416?focusedCommentId=18058245&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-18058245

Long story short: the same PVC can be assigned twice to different executors 
(and even driver if the size limit+storage class are the same) leading to not 
schedulable pods.

I assume the in memory storage of the state can be too laggy when the events 
are behind the state.

Concretely i'm using thrift server (spark one) with a spark application and 95% 
of the time the driver PVC is assigned to the "next" executor leading to the 
application to never run since pod is not schedulable but also got some cases 
where 2 executors were having the same PVC.

I suspect both are different issues:
 # the one about a wrong state assumption
 # the driver handling which is not specific

My proposal would be to always fetch PVC and check their status, if bound just 
use another one + for not yet scheduled executor (but submitted) also check PVC 
status if any to ensure to recover from the remaining edge cases.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to