Re: Metrics Problem

Bryan Jeffrey Fri, 10 Jul 2020 10:50:46 -0700

On Thu, Jul 2, 2020 at 2:33 PM Bryan Jeffrey <bryan.jeff...@gmail.com>
wrote:


> Srinivas,
>
> I finally broke a little bit of time free to look at this issue.  I
> reduced the scope of my ambitions and simply cloned a the ConsoleSink and
> ConsoleReporter class.  After doing so I can see the original version
> works, but the 'modified' version does not work.  The only difference is
> the name & location of the associated JAR.
>
> Looking here:
> http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Metric-Sink-on-Executor-Always-ClassNotFound-td34205.html#a34206
> "For executors, the jar file of the sink needs to be in the system classpath;
> the application jar is not in the system classpath, so that does not
> work. There are different ways for you to get it there, most of them
> manual (YARN is, I think, the only RM supported in Spark where the
> application itself can do it)."
>
> Looking here:
> https://forums.databricks.com/questions/706/how-can-i-attach-a-jar-library-to-the-cluster-that.html
> "Everything in spark.executor.extraClassPath is on the System classpath.
> These are listed in the Classpath Entries section and marked with Source =
> System Classpath. Everything else is on the application classpath."
>
> From the databricks forum above, it appears that one could pass in a jar
> via '--jars' and then call '--conf spark.executor.extraClassPath ./myJar'.
> However, this does not appear to work; the file is there, but not added in
> the classpath.
>
> Regards,
>
> Bryan Jeffrey
>
>
> On Tue, Jun 30, 2020 at 12:55 PM Srinivas V <srini....@gmail.com> wrote:
>
>> Then it should permission issue. What kind of cluster is it and which
>> user is running it ? Does that user have hdfs permissions to access the
>> folder where the jar file is ?
>>
>> On Mon, Jun 29, 2020 at 1:17 AM Bryan Jeffrey <bryan.jeff...@gmail.com>
>> wrote:
>>
>>> Srinivas,
>>>
>>> Interestingly, I did have the metrics jar packaged as part of my main
>>> jar. It worked well both on driver and locally, but not on executors.
>>>
>>> Regards,
>>>
>>> Bryan Jeffrey
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>> ------------------------------
>>> *From:* Srinivas V <srini....@gmail.com>
>>> *Sent:* Saturday, June 27, 2020 1:23:24 AM
>>>
>>> *To:* Bryan Jeffrey <bryan.jeff...@gmail.com>
>>> *Cc:* user <user@spark.apache.org>
>>> *Subject:* Re: Metrics Problem
>>>
>>> One option is to create your main jar included with metrics jar like a
>>> fat jar.
>>>
>>> On Sat, Jun 27, 2020 at 8:04 AM Bryan Jeffrey <bryan.jeff...@gmail.com>
>>> wrote:
>>>
>>> Srinivas,
>>>
>>> Thanks for the insight. I had not considered a dependency issue as the
>>> metrics jar works well applied on the driver. Perhaps my main jar
>>> includes the Hadoop dependencies but the metrics jar does not?
>>>
>>> I am confused as the only Hadoop dependency also exists for the built in
>>> metrics providers which appear to work.
>>>
>>> Regards,
>>>
>>> Bryan
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>> ------------------------------
>>> *From:* Srinivas V <srini....@gmail.com>
>>> *Sent:* Friday, June 26, 2020 9:47:52 PM
>>> *To:* Bryan Jeffrey <bryan.jeff...@gmail.com>
>>> *Cc:* user <user@spark.apache.org>
>>> *Subject:* Re: Metrics Problem
>>>
>>> It should work when you are giving hdfs path as long as your jar exists
>>> in the path.
>>> Your error is more security issue (Kerberos) or Hadoop dependencies
>>> missing I think, your error says :
>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation
>>>
>>> On Fri, Jun 26, 2020 at 8:44 PM Bryan Jeffrey <bryan.jeff...@gmail.com>
>>> wrote:
>>>
>>> It may be helpful to note that I'm running in Yarn cluster mode.  My
>>> goal is to avoid having to manually distribute the JAR to all of the
>>> various nodes as this makes versioning deployments difficult.
>>>
>>> On Thu, Jun 25, 2020 at 5:32 PM Bryan Jeffrey <bryan.jeff...@gmail.com>
>>> wrote:
>>>
>>> Hello.
>>>
>>> I am running Spark 2.4.4. I have implemented a custom metrics producer.
>>> It works well when I run locally, or specify the metrics producer only for
>>> the driver.  When I ask for executor metrics I run into
>>> ClassNotFoundExceptions
>>>
>>> *Is it possible to pass a metrics JAR via --jars?  If so what am I
>>> missing?*
>>>
>>> Deploy driver stats via:
>>> --jars hdfs:///custommetricsprovider.jar
>>> --conf
>>> spark.metrics.conf.driver.sink.metrics.class=org.apache.spark.mycustommetricssink
>>>
>>> However, when I pass the JAR with the metrics provider to executors via:
>>> --jars hdfs:///custommetricsprovider.jar
>>> --conf
>>> spark.metrics.conf.executor.sink.metrics.class=org.apache.spark.mycustommetricssink
>>>
>>> I get ClassNotFoundException:
>>>
>>> 20/06/25 21:19:35 ERROR MetricsSystem: Sink class
>>> org.apache.spark.custommetricssink cannot be instantiated
>>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748)
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
>>> at
>>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
>>> at
>>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
>>> at
>>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.spark.custommetricssink
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:348)
>>> at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
>>> at
>>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
>>> at
>>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
>>> at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
>>> at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
>>> at
>>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
>>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
>>> at
>>> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
>>> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
>>> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365)
>>> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
>>> at
>>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:221)
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>>> ... 4 more
>>>
>>> Is it possible to pass a metrics JAR via --jars?  If so what am I
>>> missing?
>>>
>>> Thank you,
>>>
>>> Bryan
>>>
>>>

Re: Metrics Problem

Reply via email to