On Sat, Mar 14, 2020 at 5:56 PM Andrew Melo <andrew.m...@gmail.com> wrote:
> Sorry, I'm from a completely different field, so I've inherited a completely 
> different vocabulary. So thanks for bearing with me :)
>
> I think from reading your response, maybe the confusion is that HTCondor is a 
> completely different resource acquisition model than what industry is 
> familiar with. Unlike AWS that gives you a whole VM or k8s that gives you a 
> whole container, condor (and most other batch schedulers) split up a single 
> bare machine that your job shares with whatever else is on that machine. You 
> don't get your own machine or even the illusion you have your own machine 
> (via containerization).
>
> Using these schedulers it's not that you ask for N workers when there's only 
> M machines, you request N x 8core slots when there are M cores available, and 
> the scheduler packs them wherever there's free resources.

Actually, that's exactly what a Spark standalone worker or YARN
NodeManager does. It allocates resources on a shared machine, without
virtualization. If there were Spark <> HTCondor integration, you'd
really just submit apps to the HTCondor cluster and let it allocate
_executors_ for the app for you.

Indeed you would not generally expect a resource manager to guarantee
where the resources come from. So it's possible and normal to have
multiple executors allocated by the resource manager on one machine,
for the same app.

It's not so normal to allocate multiple workers (resource manager
daemons) on a set of physical resources; it needlessly chops them up,
or even, risks them both thinking they're in charge of the same
resources. So, in Spark standalone where you control where workers
run, you wouldn't normally run multiple ones per machine. You'd let
one manage whatever resources the Spark cluster should take on the
hardware. Likewise YARN has one NodeManager per machine.

Here, you have the extra step here of allocating a resource manager
(Spark standalone) within your resource manager (HTCondor) because
there is no direct integration. And I think that's the issue. Resource
manager HTCondor isn't necessarily allocating resources in a way that
makes sense for a second-level resource manager.


> If you're talking about the 2nd half, let's say I'm running two pyspark 
> notebooks connected to the system above, and batch scheduler gives each of 
> them 2 cores of slaves. Each notebook will have their own set (which I called 
> a pool earlier) of slaves, so when you're working in one notebook, the other 
> notebook of slaves is idle. My comment was about the resources being idle and 
> the desire to increase utillzation.

I think you are saying each job spins up a whole new Spark cluster,
and every Spark cluster runs just one app. That's not crazy at all,
though, normally you would also have the possibility of one cluster
running N apps of course, and better sharing its resources. But it
sounds like it's the way you have to do it.

Well I can see some possible outcomes:

1) Can you not use HTCondor? allocate a long-lived Spark standalone
cluster instead on resources managed only by the Spark cluster, and
submit apps to it. The price is no reuse of resources with other
non-Spark applications
2) Can HTCondor be convinced to allocate chunks of resources on
distinct machines? that'd do it too
3) HTCondor can't be convinced to do any isolation of the processes
themselves right? because if the workers aren't on the same 'virtual'
machine or space then it all works out, which is why all this works
fine on K8S.
4) .. just keep this functionality in Spark as a sort of generic
resource manager bridge for cases like this. We may have identified
the perhaps niche but real use case for it beyond testing

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to