[ https://issues.apache.org/jira/browse/SPARK-27455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-27455: ---------------------------------- Affects Version/s: (was: 2.4.1) 3.0.0 > spark-submit and friends should allow main artifact to be specified as a > package > -------------------------------------------------------------------------------- > > Key: SPARK-27455 > URL: https://issues.apache.org/jira/browse/SPARK-27455 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Brian Lindblom > Assignee: Brian Lindblom > Priority: Minor > > Spark already has the ability to provide spark.jars.packages in order to > include a set of required dependencies for an application. It will > transitively resolve any provided packages via ivy, cache those artifacts, > and serve them via the driver to launched executors. It would be useful to > take this one step further and be able to allow a spark.jars.main.package and > corresponding command line flag, --main-package, to eliminate the need to > specify a specific jar file (which does NOT transitively resolve). This > could simplify many use-cases. Additionally, --main-package can trigger the > inspection of the artifact's meta-inf to determine the main class, obviating > the need for spark-submit invocations to include this information directly. > Currently, I've found that I can do > {{spark-submit --packages com.example:my-package:1.0.0 --class > com.example.MyPackage /path/to/mypackage-1.0.0.jar <my_args>}} > to achieve the same effect. This additional boiler plate, however, seems > unnecessary, especially considering one must fetch/orchestrate the jar into > some location (local or remote) in addition to specifying any dependencies. > Resorting to fat jars to simplify creates other issues. > Ideally > {{spark-submit --repository <url_to_my_repo> --main-package > com.example:my-package:1.0.0 <my_args>}} > would be all that is necessary to bootstrap an application. Obviously, care > must be taken to avoid DoS'ing <url_to_my_repo> if orchestrating many Spark > applications. In that case, it may also be desirable to implement a > --repository-cache-uri <uri_to_repository_cache> where, perhaps in the case > where an HDFS is available, we can bootstrap our application and subsequently > cache the resolution to a larger artifact in HDFS for consumption later > (zip/tar up the ivy cache itself)? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org