[ 
https://issues.apache.org/jira/browse/SPARK-27455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27455:
----------------------------------
    Affects Version/s:     (was: 2.4.1)
                       3.0.0

> spark-submit and friends should allow main artifact to be specified as a 
> package
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27455
>                 URL: https://issues.apache.org/jira/browse/SPARK-27455
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Brian Lindblom
>            Assignee: Brian Lindblom
>            Priority: Minor
>
> Spark already has the ability to provide spark.jars.packages in order to 
> include a set of required dependencies for an application.  It will 
> transitively resolve any provided packages via ivy, cache those artifacts, 
> and serve them via the driver to launched executors.  It would be useful to 
> take this one step further and be able to allow a spark.jars.main.package and 
> corresponding command line flag, --main-package, to eliminate the need to 
> specify a specific jar file (which does NOT transitively resolve).  This 
> could simplify many use-cases.  Additionally, --main-package can trigger the 
> inspection of the artifact's meta-inf to determine the main class, obviating 
> the need for spark-submit invocations to include this information directly.  
> Currently, I've found that I can do
> {{spark-submit --packages com.example:my-package:1.0.0 --class 
> com.example.MyPackage /path/to/mypackage-1.0.0.jar <my_args>}}
> to achieve the same effect.   This additional boiler plate, however, seems 
> unnecessary, especially considering one must fetch/orchestrate the jar into 
> some location (local or remote) in addition to specifying any dependencies.  
> Resorting to fat jars to simplify creates other issues.
> Ideally
> {{spark-submit --repository <url_to_my_repo> --main-package 
> com.example:my-package:1.0.0 <my_args>}}
> would be all that is necessary to bootstrap an application.  Obviously, care 
> must be taken to avoid DoS'ing <url_to_my_repo> if orchestrating many Spark 
> applications.  In that case, it may also be desirable to implement a 
> --repository-cache-uri <uri_to_repository_cache> where, perhaps in the case 
> where an HDFS is available, we can bootstrap our application and subsequently 
> cache the resolution to a larger artifact in HDFS for consumption later 
> (zip/tar up the ivy cache itself)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to