Thanks for your answer, you are correct, it's just a different approach than the one I am asking for :)

Building an uber- or assembly- jar goes against the idea of placing the jars on all workers. Uber-jars increase network traffic, using local:/ in the classpath reduces network traffic.

Eventually, depending on uber-jars can run into various problems.

Really the question is narrowly geared toward understand what arguments can setup the classpath using the --jars argument. Using an uber-jar is a workaround, true, but with downsides.

Thanks!

On 01/12/2016 12:06 AM, UMESH CHAUDHARY wrote:


Could you build a fat jar by including all your dependencies along with you application. See here <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management> and here <http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies> .

Also:
/*So this application-jar can point to a directory and will be expanded? Or
needs to be a path to a single specific jar?*/
/
/
*This will be path to a single specific JAR.*

On Tue, Jan 12, 2016 at 12:04 PM, jiml <j...@megalearningllc.com <mailto:j...@megalearningllc.com>> wrote:

    Question is: Looking for all the ways to specify a set of jars
    using --jars
    on spark-submit

    I know this is old but I am about to submit a proposed docs change on
    --jars, and I had an issue with --jars today

    When this user submitted the following command line, is that a
    proper way to
    reference a jar?

    hdfs://master:8000/srcdata/kmeans  (is that a directory? or a jar that
    doesn't end with .jar? I have not gotten into the machine learning
    libs yet
    to recognize this)

    I know the docs say, "Path to a bundled jar including your
    application and
    all dependencies. The URL must be globally visible inside of your
    cluster,
    for instance, an hdfs:// path or a file:// path that is present on all
    nodes."

    *So this application-jar can point to a directory and will be
    expanded? Or
    needs to be a path to a single specific jar?*

    I ask because when I was testing --jars today, we had to
    explicitly provide
    a path to each jar:

    //usr/local/spark/bin/spark-submit --class
    jpsgcs.thold.PipeLinkageData
    
---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
    /usr/local/spark/jars/thold-0.0.1-1.jar/

    (The only way I figured out to use the commas was a StackOverflow
    answer
    that led me to look beyond the docs to the command line:
    spark-submit --help
    results in :

     --jars JARS                 Comma-separated list of local jars to
    include
    on the driver
                                  and executor classpaths.


    And it seems that we do not need to put the main jar in the --jars
    argument,
    I have not tested yet if other classes in the application-jar
    (/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers,
    or if I
    need to put the application-jar in the --jars path to get classes
    not named
    after --class to be seen?

    Thanks for any ideas




    --
    View this message in context:
    
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
    Sent from the Apache Spark User List mailing list archive at
    Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>




Reply via email to