Hi,

I'm noticing that a 30 minute job that was initially IO-bound may not be
during subsequent runs.  Is there some kind of between-job caching that
happens in Spark or in Linux that outlives jobs and that might be making
subsequent runs faster?  If so, is there a way to avoid the caching in
order to get a better sense of the worst-case scenario?

(It's also possible that I've simply changed something that made things
faster.)

Eric

Reply via email to