I think we want to change the value of spark.local.dir to point to where your 
PVC is mounted. Can you give that a try and let us know if that moves the 
spills as expected?

 

-Matt Cheah

 

From: Tomasz Krol <patric...@gmail.com>
Date: Wednesday, February 27, 2019 at 3:41 AM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Spark on k8s - map persistentStorage for data spilling

 

Hey Guys,

 

I hope someone will be able to help me, as I've stuck with this for a while:) 
Basically I am running some jobs on kubernetes as per documentation

 

https://spark.apache.org/docs/latest/running-on-kubernetes.html 
[spark.apache.org]

 

All works fine, however if I run queries on bigger data volume, then jobs 
failing that there is not enough space in /var/data/spark-1xxx directory.

 

Obviously the reason for this is that emptyDir mounted doesnt have enough space.

 

I also mounted pvc to the driver and executors pods which I can see during the 
runtime. I am wondering if someone knows how to set that data will be spilled 
to different directory (i.e my persistent storage directory) instead of empyDir 
with some limitted space. Or if I can mount the empyDir somehow on my pvc. 
Basically at the moment I cant run any jobs as they are failing due to 
insufficient space in that /var/data directory.

 

Thanks

-- 

Tomasz Krol
patric...@gmail.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to