Re: [DISCUSS] Spark runner packaging

Lukasz Cwik Thu, 07 Jul 2016 12:10:50 -0700

What I meant by saying that this could be part of Apache Beam is that the
build scripts that generate the binary artifact could be part of Apache
Beam not the binary artifact itself.
So the question I was asking is whether the build scripts to generate the
artifact should be part of Apache Beam, or separate? Also how?


On Thu, Jul 7, 2016 at 2:59 PM, Robert Bradshaw <rober...@google.com.invalid
> wrote:

> I don't think the proposal is to put this into the source release, rather
> to have a separate binary artifact that's Beam+Spark.
>
> On Thu, Jul 7, 2016 at 11:54 AM, Vlad Rozov <v.ro...@datatorrent.com>
> wrote:
>
> > I am not sure if I read the proposal correctly, but note that it will be
> > against Apache policy to include compiled binaries into the source
> release.
> > On the other side, each runner may include necessary run-time binaries as
> > test only dependencies into the runner's maven pom.xml
> >
> >
> > On 7/7/16 11:01, Lukasz Cwik wrote:
> >
> >> That makes a lot of sense. I can see other runners following suit where
> >> there is a packaged up version for different scenarios / backend cluster
> >> runtimes.
> >>
> >> Should this be part of Apache Beam as a separate maven module or another
> >> sub-module inside of Apache Beam, or something else?
> >>
> >> On Thu, Jul 7, 2016 at 1:49 PM, Amit Sela <amitsel...@gmail.com> wrote:
> >>
> >> Hi everyone,
> >>>
> >>> Lately I've encountered a number of issues concerning the fact that the
> >>> Spark runner does not package Spark along with it and forcing people to
> >>> do
> >>> this on their own.
> >>> In addition, this seems to get in the way of having beam-examples
> >>> executed
> >>> against the Spark runner, again because it would have to add Spark
> >>> dependencies.
> >>>
> >>> When running on a cluster (which I guess was the original goal here),
> it
> >>> is
> >>> recommended to have Spark provided by the cluster - this makes sense
> for
> >>> Spark clusters and more so for Spark + YARN clusters where you might
> have
> >>> your Spark built against a specific Hadoop version or using a vendor
> >>> distribution.
> >>>
> >>> In order to make the runner more accessible to new adopters, I suggest
> to
> >>> consider releasing a "spark-included" artifact as well.
> >>>
> >>> Thoughts ?
> >>>
> >>> Thanks,
> >>> Amit
> >>>
> >>>
> >
>

Re: [DISCUSS] Spark runner packaging

Reply via email to