On Thu, Jul 2, 2020 at 2:33 PM Bryan Jeffrey <bryan.jeff...@gmail.com> wrote:
> Srinivas, > > I finally broke a little bit of time free to look at this issue. I > reduced the scope of my ambitions and simply cloned a the ConsoleSink and > ConsoleReporter class. After doing so I can see the original version > works, but the 'modified' version does not work. The only difference is > the name & location of the associated JAR. > > Looking here: > http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Metric-Sink-on-Executor-Always-ClassNotFound-td34205.html#a34206 > "For executors, the jar file of the sink needs to be in the system classpath; > the application jar is not in the system classpath, so that does not > work. There are different ways for you to get it there, most of them > manual (YARN is, I think, the only RM supported in Spark where the > application itself can do it)." > > Looking here: > https://forums.databricks.com/questions/706/how-can-i-attach-a-jar-library-to-the-cluster-that.html > "Everything in spark.executor.extraClassPath is on the System classpath. > These are listed in the Classpath Entries section and marked with Source = > System Classpath. Everything else is on the application classpath." > > From the databricks forum above, it appears that one could pass in a jar > via '--jars' and then call '--conf spark.executor.extraClassPath ./myJar'. > However, this does not appear to work; the file is there, but not added in > the classpath. > > Regards, > > Bryan Jeffrey > > > On Tue, Jun 30, 2020 at 12:55 PM Srinivas V <srini....@gmail.com> wrote: > >> Then it should permission issue. What kind of cluster is it and which >> user is running it ? Does that user have hdfs permissions to access the >> folder where the jar file is ? >> >> On Mon, Jun 29, 2020 at 1:17 AM Bryan Jeffrey <bryan.jeff...@gmail.com> >> wrote: >> >>> Srinivas, >>> >>> Interestingly, I did have the metrics jar packaged as part of my main >>> jar. It worked well both on driver and locally, but not on executors. >>> >>> Regards, >>> >>> Bryan Jeffrey >>> >>> Get Outlook for Android <https://aka.ms/ghei36> >>> >>> ------------------------------ >>> *From:* Srinivas V <srini....@gmail.com> >>> *Sent:* Saturday, June 27, 2020 1:23:24 AM >>> >>> *To:* Bryan Jeffrey <bryan.jeff...@gmail.com> >>> *Cc:* user <user@spark.apache.org> >>> *Subject:* Re: Metrics Problem >>> >>> One option is to create your main jar included with metrics jar like a >>> fat jar. >>> >>> On Sat, Jun 27, 2020 at 8:04 AM Bryan Jeffrey <bryan.jeff...@gmail.com> >>> wrote: >>> >>> Srinivas, >>> >>> Thanks for the insight. I had not considered a dependency issue as the >>> metrics jar works well applied on the driver. Perhaps my main jar >>> includes the Hadoop dependencies but the metrics jar does not? >>> >>> I am confused as the only Hadoop dependency also exists for the built in >>> metrics providers which appear to work. >>> >>> Regards, >>> >>> Bryan >>> >>> Get Outlook for Android <https://aka.ms/ghei36> >>> >>> ------------------------------ >>> *From:* Srinivas V <srini....@gmail.com> >>> *Sent:* Friday, June 26, 2020 9:47:52 PM >>> *To:* Bryan Jeffrey <bryan.jeff...@gmail.com> >>> *Cc:* user <user@spark.apache.org> >>> *Subject:* Re: Metrics Problem >>> >>> It should work when you are giving hdfs path as long as your jar exists >>> in the path. >>> Your error is more security issue (Kerberos) or Hadoop dependencies >>> missing I think, your error says : >>> org.apache.hadoop.security.UserGroupInformation.doAs( >>> UserGroupInformation >>> >>> On Fri, Jun 26, 2020 at 8:44 PM Bryan Jeffrey <bryan.jeff...@gmail.com> >>> wrote: >>> >>> It may be helpful to note that I'm running in Yarn cluster mode. My >>> goal is to avoid having to manually distribute the JAR to all of the >>> various nodes as this makes versioning deployments difficult. >>> >>> On Thu, Jun 25, 2020 at 5:32 PM Bryan Jeffrey <bryan.jeff...@gmail.com> >>> wrote: >>> >>> Hello. >>> >>> I am running Spark 2.4.4. I have implemented a custom metrics producer. >>> It works well when I run locally, or specify the metrics producer only for >>> the driver. When I ask for executor metrics I run into >>> ClassNotFoundExceptions >>> >>> *Is it possible to pass a metrics JAR via --jars? If so what am I >>> missing?* >>> >>> Deploy driver stats via: >>> --jars hdfs:///custommetricsprovider.jar >>> --conf >>> spark.metrics.conf.driver.sink.metrics.class=org.apache.spark.mycustommetricssink >>> >>> However, when I pass the JAR with the metrics provider to executors via: >>> --jars hdfs:///custommetricsprovider.jar >>> --conf >>> spark.metrics.conf.executor.sink.metrics.class=org.apache.spark.mycustommetricssink >>> >>> I get ClassNotFoundException: >>> >>> 20/06/25 21:19:35 ERROR MetricsSystem: Sink class >>> org.apache.spark.custommetricssink cannot be instantiated >>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64) >>> at >>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) >>> at >>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281) >>> at >>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.spark.custommetricssink >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:348) >>> at org.apache.spark.util.Utils$.classForName(Utils.scala:238) >>> at >>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198) >>> at >>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194) >>> at >>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) >>> at >>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) >>> at >>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) >>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) >>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) >>> at >>> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194) >>> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102) >>> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365) >>> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201) >>> at >>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:221) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) >>> ... 4 more >>> >>> Is it possible to pass a metrics JAR via --jars? If so what am I >>> missing? >>> >>> Thank you, >>> >>> Bryan >>> >>>