[ https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953209#comment-16953209 ]
BoYang commented on SPARK-29472: -------------------------------- This is a pretty good feature, helping to solve production issue when there is jar file conflict! > Mechanism for Excluding Jars at Launch for YARN > ----------------------------------------------- > > Key: SPARK-29472 > URL: https://issues.apache.org/jira/browse/SPARK-29472 > Project: Spark > Issue Type: New Feature > Components: YARN > Affects Versions: 2.4.4 > Reporter: Abhishek Modi > Priority: Minor > > *Summary* > It would be convenient if there were an easy way to exclude jars from Spark’s > classpath at launch time. This would complement the way in which jars can be > added to the classpath using {{extraClassPath}}. > > *Context* > The Spark build contains its dependency jars in the {{/jars}} directory. > These jars become part of the executor’s classpath. By default on YARN, these > jars are packaged and distributed to containers at launch ({{spark-submit}}) > time. > > While developing Spark applications, customers sometimes need to debug using > different versions of dependencies. This can become difficult if the > dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} > (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is > preferentially loaded. > > Configurations such as {{userClassPathFirst}} are available. However these > have often come with other side effects. For example, if the customer’s build > includes Avro they will likely see {{Caused by: java.lang.LinkageError: > loader constraint violation: when resolving method > "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;" > the class loader (instance of > org/apache/spark/util/ChildFirstURLClassLoader) of the current class, > com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance > of sun/misc/Launcher$AppClassLoader) for the method's defining class, > org/apache/spark/SparkConf, have different Class objects for the type > scala/collection/Seq used in the signature}}. Resolving such issues often > takes many hours. > > To deal with these sorts of issues, customers often download the Spark build, > remove the target jars and then do spark-submit. Other times, customers may > not be able to do spark-submit as it is gated behind some Spark Job Server. > In this case, customers may try downloading the build, removing the jars, and > then using configurations such as {{spark.yarn.dist.jars}} or > {{spark.yarn.dist.archives}}. Both of these options are undesirable as they > are very operationally heavy, error prone and often result in the customer’s > spark builds going out of sync with the authoritative build. > > *Solution* > I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} > configuration. Customers could provide a regex such as {{.\*parquet.\*}} and > jar files matching this regex would not be included in the driver and > executor classpath. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org