Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Sean Owen
Do you really need a new cluster per user? and if so, why specify N
workers > M machines? I am not seeing a need for that. I don't even
think 2 workers on the same host makes sense, as they are both
managing the same resources; it only exists for test purposes AFAICT.

What you are trying to do sounds like one cluster, not many. JVMs
can't be shared across users; JVM = executor. But that's a good thing,
or else there would be all kinds of collisions.

What pools are you referring to?

Sean

On Fri, Mar 13, 2020 at 6:33 PM Andrew Melo  wrote:
>
> Hi Xingbo, Sean,
>
> On Fri, Mar 13, 2020 at 12:31 PM Xingbo Jiang  wrote:
>>
>> Andrew, could you provide more context of your use case please? Is it like 
>> you deploy homogeneous containers on hosts with available resources, and 
>> each container launches one worker? Or you deploy workers directly on hosts 
>> thus you could have multiple workers from the same application on the same 
>> host?
>
>
> Sure, I describe a bit more detail about the actual workload below [*], but 
> the short version is that our computing resources/infrastructure are all 
> built around batch submission into (usually) the HTCondor scheduler, and 
> we've got a PoC using pyspark to replace the interactive portion of data 
> analysis. To run pyspark on our main resources, we use some scripts around 
> the standalone mode to spin up N slaves per-user**, which may or may not end 
> up on the same host. I understood Xingbo's original mail to mean that 
> wouldn't be allowed in the future, but from Sean's response, it seems like 
> I'm incorrect.
>
> That being said, our use-case is very bursty, and it would be very good if 
> there was a way we could have one pool of JVMs that could be shared between N 
> different concurrent users instead of having N different pools of JVMs that 
> each serve one person. We're already resource constrained, and we're 
> expecting our data rates to increase 10x in 2026, so the less idle CPU, the 
> better for us.
>
> Andrew
>
> * I work for one of the LHC experiments at CERN (https://cms.cern/) and 
> there's two main "phases" of our data pipeline: production and analysis. The 
> analysis half is currently implemented by having users writing some software, 
> splitting the input dataset(s) into N parts and then submitting those jobs to 
> the batch system (combining the outputs in a manual postprocessing step). In 
> terms of scale, there are currently ~100 users running ~900 tasks over ~50k 
> cpus. The use case relevant to this context is the terminal analysis phase 
> which involves calculating some additional columns, applying calibrations, 
> filtering out the 'interesting' events and extracting histograms describing 
> those events. Conceptually, it's an iterative process of "extract plots, look 
> at plots, change parameters", but running in a batch system means the latency 
> is bad, so it can take a long time to converge to the right set of params.
>
> ** though we have much smaller, dedicated k8s/mesos/yarn clusters we use for 
> prototyping
>
>>
>> Thanks,
>>
>> Xingbo
>>
>> On Fri, Mar 13, 2020 at 10:23 AM Sean Owen  wrote:
>>>
>>> You have multiple workers in one Spark (standalone) app? this wouldn't
>>> prevent N apps from each having a worker on a machine.
>>>
>>> On Fri, Mar 13, 2020 at 11:51 AM Andrew Melo  wrote:
>>> >
>>> > Hello,
>>> >
>>> > On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang  wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> Based on my experience, there is no scenario that necessarily requires 
>>> >> deploying multiple Workers on the same node with Standalone backend. A 
>>> >> worker should book all the resources reserved to Spark on the host it is 
>>> >> launched, then it can allocate those resources to one or more executors 
>>> >> launched by this worker. Since each executor runs in a separated JVM, we 
>>> >> can limit the memory of each executor to avoid long GC pause.
>>> >>
>>> >> The remaining concern is the local-cluster mode is implemented by 
>>> >> launching multiple workers on the local host, we might need to 
>>> >> re-implement LocalSparkCluster to launch only one Worker and multiple 
>>> >> executors. It should be fine because local-cluster mode is only used in 
>>> >> running Spark unit test cases, thus end users should not be affected by 
>>> >> this change.
>>> >>
>>> >> Removing multiple workers on the same host support could simplify the 
>>> >> deploy model of Standalone backend, and also reduce the burden to 
>>> >> support legacy deploy pattern in the future feature developments. (There 
>>> >> is an example in https://issues.apache.org/jira/browse/SPARK-27371 , 
>>> >> where we designed a complex approach to coordinate resource requirements 
>>> >> from different workers launched on the same host).
>>> >>
>>> >> The proposal is to update the document to deprecate the support of 
>>> >> system environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the 
>>> >> support in the next major version (Spark

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Andrew Melo
Hi Xingbo, Sean,

On Fri, Mar 13, 2020 at 12:31 PM Xingbo Jiang  wrote:

> Andrew, could you provide more context of your use case please? Is it like
> you deploy homogeneous containers on hosts with available resources, and
> each container launches one worker? Or you deploy workers directly on hosts
> thus you could have multiple workers from the same application on the same
> host?
>

Sure, I describe a bit more detail about the actual workload below [*], but
the short version is that our computing resources/infrastructure are all
built around batch submission into (usually) the HTCondor scheduler, and
we've got a PoC using pyspark to replace the interactive portion of data
analysis. To run pyspark on our main resources, we use some scripts around
the standalone mode to spin up N slaves per-user**, which may or may not
end up on the same host. I understood Xingbo's original mail to mean that
wouldn't be allowed in the future, but from Sean's response, it seems like
I'm incorrect.

That being said, our use-case is very bursty, and it would be very good if
there was a way we could have one pool of JVMs that could be shared between
N different concurrent users instead of having N different pools of JVMs
that each serve one person. We're already resource constrained, and we're
expecting our data rates to increase 10x in 2026, so the less idle CPU, the
better for us.

Andrew

* I work for one of the LHC experiments at CERN (https://cms.cern/)
and there's two main "phases" of our data pipeline: production and
analysis. The analysis half is currently implemented by having users
writing some software, splitting the input dataset(s) into N parts and then
submitting those jobs to the batch system (combining the outputs in a
manual postprocessing step). In terms of scale, there are currently ~100
users running ~900 tasks over ~50k cpus. The use case relevant to this
context is the terminal analysis phase which involves calculating some
additional columns, applying calibrations, filtering out the 'interesting'
events and extracting histograms describing those events. Conceptually,
it's an iterative process of "extract plots, look at plots, change
parameters", but running in a batch system means the latency is bad, so it
can take a long time to converge to the right set of params.

** though we have much smaller, dedicated k8s/mesos/yarn clusters we use
for prototyping


> Thanks,
>
> Xingbo
>
> On Fri, Mar 13, 2020 at 10:23 AM Sean Owen  wrote:
>
>> You have multiple workers in one Spark (standalone) app? this wouldn't
>> prevent N apps from each having a worker on a machine.
>>
>> On Fri, Mar 13, 2020 at 11:51 AM Andrew Melo 
>> wrote:
>> >
>> > Hello,
>> >
>> > On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang 
>> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> Based on my experience, there is no scenario that necessarily requires
>> deploying multiple Workers on the same node with Standalone backend. A
>> worker should book all the resources reserved to Spark on the host it is
>> launched, then it can allocate those resources to one or more executors
>> launched by this worker. Since each executor runs in a separated JVM, we
>> can limit the memory of each executor to avoid long GC pause.
>> >>
>> >> The remaining concern is the local-cluster mode is implemented by
>> launching multiple workers on the local host, we might need to re-implement
>> LocalSparkCluster to launch only one Worker and multiple executors. It
>> should be fine because local-cluster mode is only used in running Spark
>> unit test cases, thus end users should not be affected by this change.
>> >>
>> >> Removing multiple workers on the same host support could simplify the
>> deploy model of Standalone backend, and also reduce the burden to support
>> legacy deploy pattern in the future feature developments. (There is an
>> example in https://issues.apache.org/jira/browse/SPARK-27371 , where we
>> designed a complex approach to coordinate resource requirements from
>> different workers launched on the same host).
>> >>
>> >> The proposal is to update the document to deprecate the support of
>> system environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the
>> support in the next major version (Spark 3.1).
>> >>
>> >> Please kindly let me know if you have use cases relying on this
>> feature.
>> >
>> >
>> > When deploying spark on batch systems (by wrapping the standalone
>> deployment in scripts that can be consumed by the batch scheduler), we
>> typically end up with >1 worker per host. If I understand correctly, this
>> proposal would make our use case unsupported.
>> >
>> > Thanks,
>> > Andrew
>> >
>> >
>> >
>> >>
>> >> Thanks!
>> >>
>> >> Xingbo
>> >
>> > --
>> > It's dark in this basement.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


[SPARK-25299] A Discussion About Shuffle Metadata Tracking

2020-03-13 Thread Matt Cheah
Hi everyone,

A working group in the community have been having ongoing discussions regarding 
how we can allow for flexible storage solutions for shuffle data that is 
compatible with containerized systems, more resilient to node failures, and can 
support disaggregated storage architectures.

One of the core challenges we have been trying to overcome is navigating the 
space of shuffle metadata tracking, and reasoning about how we approach 
recomputing lost shuffle blocks in the case when the shuffle file storage 
system is not resilient.

I have written a design document on the subject, and a proposed set of APIs to 
fix it. These should be considered as part of the APIs for 
SPARK-25299. Once we have 
reached some common conclusion on the proper APIs to build, I can modify the 
original SPARK-25299 SPIP to reflect the choices we’ve made. But I wanted to 
write more extensively on this topic separately to encourage focused discussion 
on this subset of the problem space.

You can find the design document here: 
https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit?usp=sharing

If you would like to catch up on the discussions we have had so far, I give 
some background to the subject matter and have linked to other relevant design 
documents and discussion threads in this document.

Feedback is definitely appreciated – I acknowledge that this is a fairly 
complex space with lots of potential viable options, so I’m looking forward to 
engaging with dialogue moving forward.

Thanks!

-Matt Cheah


Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Xingbo Jiang
Andrew, could you provide more context of your use case please? Is it like
you deploy homogeneous containers on hosts with available resources, and
each container launches one worker? Or you deploy workers directly on hosts
thus you could have multiple workers from the same application on the same
host?

Thanks,

Xingbo

On Fri, Mar 13, 2020 at 10:23 AM Sean Owen  wrote:

> You have multiple workers in one Spark (standalone) app? this wouldn't
> prevent N apps from each having a worker on a machine.
>
> On Fri, Mar 13, 2020 at 11:51 AM Andrew Melo 
> wrote:
> >
> > Hello,
> >
> > On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang 
> wrote:
> >>
> >> Hi all,
> >>
> >> Based on my experience, there is no scenario that necessarily requires
> deploying multiple Workers on the same node with Standalone backend. A
> worker should book all the resources reserved to Spark on the host it is
> launched, then it can allocate those resources to one or more executors
> launched by this worker. Since each executor runs in a separated JVM, we
> can limit the memory of each executor to avoid long GC pause.
> >>
> >> The remaining concern is the local-cluster mode is implemented by
> launching multiple workers on the local host, we might need to re-implement
> LocalSparkCluster to launch only one Worker and multiple executors. It
> should be fine because local-cluster mode is only used in running Spark
> unit test cases, thus end users should not be affected by this change.
> >>
> >> Removing multiple workers on the same host support could simplify the
> deploy model of Standalone backend, and also reduce the burden to support
> legacy deploy pattern in the future feature developments. (There is an
> example in https://issues.apache.org/jira/browse/SPARK-27371 , where we
> designed a complex approach to coordinate resource requirements from
> different workers launched on the same host).
> >>
> >> The proposal is to update the document to deprecate the support of
> system environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the
> support in the next major version (Spark 3.1).
> >>
> >> Please kindly let me know if you have use cases relying on this feature.
> >
> >
> > When deploying spark on batch systems (by wrapping the standalone
> deployment in scripts that can be consumed by the batch scheduler), we
> typically end up with >1 worker per host. If I understand correctly, this
> proposal would make our use case unsupported.
> >
> > Thanks,
> > Andrew
> >
> >
> >
> >>
> >> Thanks!
> >>
> >> Xingbo
> >
> > --
> > It's dark in this basement.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Sean Owen
You have multiple workers in one Spark (standalone) app? this wouldn't
prevent N apps from each having a worker on a machine.

On Fri, Mar 13, 2020 at 11:51 AM Andrew Melo  wrote:
>
> Hello,
>
> On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang  wrote:
>>
>> Hi all,
>>
>> Based on my experience, there is no scenario that necessarily requires 
>> deploying multiple Workers on the same node with Standalone backend. A 
>> worker should book all the resources reserved to Spark on the host it is 
>> launched, then it can allocate those resources to one or more executors 
>> launched by this worker. Since each executor runs in a separated JVM, we can 
>> limit the memory of each executor to avoid long GC pause.
>>
>> The remaining concern is the local-cluster mode is implemented by launching 
>> multiple workers on the local host, we might need to re-implement 
>> LocalSparkCluster to launch only one Worker and multiple executors. It 
>> should be fine because local-cluster mode is only used in running Spark unit 
>> test cases, thus end users should not be affected by this change.
>>
>> Removing multiple workers on the same host support could simplify the deploy 
>> model of Standalone backend, and also reduce the burden to support legacy 
>> deploy pattern in the future feature developments. (There is an example in 
>> https://issues.apache.org/jira/browse/SPARK-27371 , where we designed a 
>> complex approach to coordinate resource requirements from different workers 
>> launched on the same host).
>>
>> The proposal is to update the document to deprecate the support of system 
>> environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the support in 
>> the next major version (Spark 3.1).
>>
>> Please kindly let me know if you have use cases relying on this feature.
>
>
> When deploying spark on batch systems (by wrapping the standalone deployment 
> in scripts that can be consumed by the batch scheduler), we typically end up 
> with >1 worker per host. If I understand correctly, this proposal would make 
> our use case unsupported.
>
> Thanks,
> Andrew
>
>
>
>>
>> Thanks!
>>
>> Xingbo
>
> --
> It's dark in this basement.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Andrew Melo
Hello,

On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang  wrote:

> Hi all,
>
> Based on my experience, there is no scenario that necessarily requires
> deploying multiple Workers on the same node with Standalone backend. A
> worker should book all the resources reserved to Spark on the host it is
> launched, then it can allocate those resources to one or more executors
> launched by this worker. Since each executor runs in a separated JVM, we
> can limit the memory of each executor to avoid long GC pause.
>
> The remaining concern is the local-cluster mode is implemented by
> launching multiple workers on the local host, we might need to re-implement
> LocalSparkCluster to launch only one Worker and multiple executors. It
> should be fine because local-cluster mode is only used in running Spark
> unit test cases, thus end users should not be affected by this change.
>
> Removing multiple workers on the same host support could simplify the
> deploy model of Standalone backend, and also reduce the burden to support
> legacy deploy pattern in the future feature developments. (There is an
> example in https://issues.apache.org/jira/browse/SPARK-27371 , where we
> designed a complex approach to coordinate resource requirements from
> different workers launched on the same host).
>
> The proposal is to update the document to deprecate the support of system
> environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the support
> in the next major version (Spark 3.1).
>
> Please kindly let me know if you have use cases relying on this feature.
>

When deploying spark on batch systems (by wrapping the standalone
deployment in scripts that can be consumed by the batch scheduler), we
typically end up with >1 worker per host. If I understand correctly, this
proposal would make our use case unsupported.

Thanks,
Andrew




> Thanks!
>
> Xingbo
>
-- 
It's dark in this basement.


[DISCUSS] Null-handling of primitive-type of untyped Scala UDF in Scala 2.12

2020-03-13 Thread wuyi
Hi all, I'd like to raise a discussion here about null-handling of
primitive-type of untyped Scala UDF [ udf(f: AnyRef, dataType: DataType) ].

After we switch to Scala 2.12 in 3.0, the untyped Scala UDF is broken
because now we can't use reflection to get the parameter types of the Scala
lambda.
This leads to silent result changing, for example, with UDF defined as `val
f = udf((x: Int) => x, IntegerType)`, the query `select f($"x")` has
different
behavior between 2.4 and 3.0 when the input value of column x is null.

Spark 2.4:  null
Spark 3.0:  0

Because of it, we deprecate the untyped Scala UDF in 3.0 and recommend users
to use the typed ones. However, recently I identified several valid use
cases,
e.g., `val f = (r: Row) => Row(r.getAs[Int](0) * 2)`, where the schema
cannot be detected in typed Scala UDF [ udf[RT: TypeTag, A1: TypeTag](f:
Function1[A1, RT]) ].

There are 3 solutions:
1. find a way to get Scala lambda parameter types by reflection (I tried it
very hard but has no luck. The Java SAM type is so dynamic)
2. support case class as the input of typed Scala UDF, so at least people
can still deal with struct type input column with UDF
3. add a new variant of untyped Scala UDF which users can specify input
types

I'd like to see more feedbacks or ideas about how to move forward.

Thanks,
Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org