I am not sure if I read the proposal correctly, but note that it will be against Apache policy to include compiled binaries into the source release. On the other side, each runner may include necessary run-time binaries as test only dependencies into the runner's maven pom.xml

On 7/7/16 11:01, Lukasz Cwik wrote:
That makes a lot of sense. I can see other runners following suit where
there is a packaged up version for different scenarios / backend cluster
runtimes.

Should this be part of Apache Beam as a separate maven module or another
sub-module inside of Apache Beam, or something else?

On Thu, Jul 7, 2016 at 1:49 PM, Amit Sela <amitsel...@gmail.com> wrote:

Hi everyone,

Lately I've encountered a number of issues concerning the fact that the
Spark runner does not package Spark along with it and forcing people to do
this on their own.
In addition, this seems to get in the way of having beam-examples executed
against the Spark runner, again because it would have to add Spark
dependencies.

When running on a cluster (which I guess was the original goal here), it is
recommended to have Spark provided by the cluster - this makes sense for
Spark clusters and more so for Spark + YARN clusters where you might have
your Spark built against a specific Hadoop version or using a vendor
distribution.

In order to make the runner more accessible to new adopters, I suggest to
consider releasing a "spark-included" artifact as well.

Thoughts ?

Thanks,
Amit


Reply via email to