Re: Spark as a application library vs infra

Tobias Pfeiffer Sun, 27 Jul 2014 20:53:25 -0700

Mayur,

I don't know if I exactly understand the context of what you are asking,
but let me just mention issues I had with deploying.


* As my application is a streaming application, it doesn't read any files
from disk, so therefore I have no Hadoop/HDFS in place and I there is no
need for it, either. There should be no dependency on Hadoop or HDFS, since
you can perfectly run Spark applications without it.
* I use Mesos and so far I always had the downloaded Spark distribution
accessible for all machines (e.g., via HTTP) and then added my application
code by uploading a jar built with `sbt assembly`. As the Spark code itself
must not be contained in that jar file, I had to add '% "provided"' in the
sbt file, which in turn prevented me from running the application locally
from IntelliJ IDEA (it would not find the libraries marked with
"provided"), I always had to use `sbt run`.
* When using Mesos, on the Spark slaves the Spark jar is loaded before the
application jar, and so the log4j file from the Spark jar is used instead
of my custom one (that is different when running locally), so I had to edit
that file in the Spark distribution jar to customize logging of my Spark
nodes.

I wonder if the two latter problems would vanish if the Spark libraries
were bundled together with the application. (That would be your approach
#1, I guess.)

Tobias

Re: Spark as a application library vs infra

Reply via email to