Hi everyone, Lately I've encountered a number of issues concerning the fact that the Spark runner does not package Spark along with it and forcing people to do this on their own. In addition, this seems to get in the way of having beam-examples executed against the Spark runner, again because it would have to add Spark dependencies.
When running on a cluster (which I guess was the original goal here), it is recommended to have Spark provided by the cluster - this makes sense for Spark clusters and more so for Spark + YARN clusters where you might have your Spark built against a specific Hadoop version or using a vendor distribution. In order to make the runner more accessible to new adopters, I suggest to consider releasing a "spark-included" artifact as well. Thoughts ? Thanks, Amit