That makes a lot of sense. I can see other runners following suit where
there is a packaged up version for different scenarios / backend cluster
runtimes.

Should this be part of Apache Beam as a separate maven module or another
sub-module inside of Apache Beam, or something else?

On Thu, Jul 7, 2016 at 1:49 PM, Amit Sela <amitsel...@gmail.com> wrote:

> Hi everyone,
>
> Lately I've encountered a number of issues concerning the fact that the
> Spark runner does not package Spark along with it and forcing people to do
> this on their own.
> In addition, this seems to get in the way of having beam-examples executed
> against the Spark runner, again because it would have to add Spark
> dependencies.
>
> When running on a cluster (which I guess was the original goal here), it is
> recommended to have Spark provided by the cluster - this makes sense for
> Spark clusters and more so for Spark + YARN clusters where you might have
> your Spark built against a specific Hadoop version or using a vendor
> distribution.
>
> In order to make the runner more accessible to new adopters, I suggest to
> consider releasing a "spark-included" artifact as well.
>
> Thoughts ?
>
> Thanks,
> Amit
>

Reply via email to