[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path
[ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532102#comment-16532102 ] Apache Spark commented on SPARK-5152: - User 'jzhuge' has created a pull request for this issue: https://github.com/apache/spark/pull/21709 > Let metrics.properties file take an hdfs:// path > > > Key: SPARK-5152 > URL: https://issues.apache.org/jira/browse/SPARK-5152 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ryan Williams >Priority: Major > > From my reading of [the > code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], > the {{spark.metrics.conf}} property must be a path that is resolvable on the > local filesystem of each executor. > Running a Spark job with {{--conf > spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs > many errors (~1 per executor, presumably?) like: > {code} > 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file > java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > at java.io.FileInputStream.(FileInputStream.java:101) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53) > at > org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > {code} > which seems consistent with the idea that it's looking on the local > filesystem and not parsing the "scheme" portion of the URL. > Letting all executors get their {{metrics.properties}} files from one > location on HDFS would be an improvement, right? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path
[ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511724#comment-16511724 ] John Zhuge commented on SPARK-5152: --- SPARK-7169 alleviated this issue, however, still find this approach *spark.metrics.conf=s3://bucket/spark-metrics/graphite.properties* a little more convenient and clean. Compared to *spark.metrics.conf.** in SparkConf, a metrics config file groups the properties together, separate from the rest of the Spark properties. In my case, there are 10 properties. It is easy to swap out the config file by different users or for different purposes, especially in a self-serving environment. I wish spark-submit can accept multiple '--properties-file' options. The downside is this will add one more dependency on hadoop-client in spark-core, besides history server. Pretty simple change. Let me know whether I can post an PR. {code:java} - case Some(f) => new FileInputStream(f) + case Some(f) => + val hadoopPath = new Path(Utils.resolveURI(f)) + Utils.getHadoopFileSystem(hadoopPath.toUri, new Configuration()).open(hadoopPath) {code} > Let metrics.properties file take an hdfs:// path > > > Key: SPARK-5152 > URL: https://issues.apache.org/jira/browse/SPARK-5152 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ryan Williams >Priority: Major > > From my reading of [the > code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], > the {{spark.metrics.conf}} property must be a path that is resolvable on the > local filesystem of each executor. > Running a Spark job with {{--conf > spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs > many errors (~1 per executor, presumably?) like: > {code} > 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file > java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > at java.io.FileInputStream.(FileInputStream.java:101) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53) > at > org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > {code} > which seems consistent with the idea that it's looking on the local > filesystem and not parsing the "scheme" portion of the URL. > Letting all executors get their {{metrics.properties}} files from one > location on HDFS would be an improvement, right? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path
[ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094419#comment-16094419 ] gal fins commented on SPARK-5152: - Hi folks, Any idea if this issue resolved? > Let metrics.properties file take an hdfs:// path > > > Key: SPARK-5152 > URL: https://issues.apache.org/jira/browse/SPARK-5152 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ryan Williams > > From my reading of [the > code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], > the {{spark.metrics.conf}} property must be a path that is resolvable on the > local filesystem of each executor. > Running a Spark job with {{--conf > spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs > many errors (~1 per executor, presumably?) like: > {code} > 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file > java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > at java.io.FileInputStream.(FileInputStream.java:101) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53) > at > org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > {code} > which seems consistent with the idea that it's looking on the local > filesystem and not parsing the "scheme" portion of the URL. > Letting all executors get their {{metrics.properties}} files from one > location on HDFS would be an improvement, right? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path
[ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902883#comment-14902883 ] Yongjia Wang commented on SPARK-5152: - I voted for this. It enables configuring metrics or log4j properties of all the workers just from the driver. Without it, you will have to setup on each of the workers. Alternatively, it's probably even better if there is a way, to specify through "conf" spark properties in the spark-submit command line, to upload custom files to spark executor's working directory before the executor process starts. the "spark.files" option upload the files lazily when the first task starts, which is too late for configuration. > Let metrics.properties file take an hdfs:// path > > > Key: SPARK-5152 > URL: https://issues.apache.org/jira/browse/SPARK-5152 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ryan Williams > > From my reading of [the > code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], > the {{spark.metrics.conf}} property must be a path that is resolvable on the > local filesystem of each executor. > Running a Spark job with {{--conf > spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs > many errors (~1 per executor, presumably?) like: > {code} > 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file > java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > at java.io.FileInputStream.(FileInputStream.java:101) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53) > at > org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > {code} > which seems consistent with the idea that it's looking on the local > filesystem and not parsing the "scheme" portion of the URL. > Letting all executors get their {{metrics.properties}} files from one > location on HDFS would be an improvement, right? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path
[ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271625#comment-14271625 ] Ryan Williams commented on SPARK-5152: -- so I've been fumbling my way around the "MetricsSystem" code paths a bit lately in search of better ways to monitor my jobs, and I'm interested in hearing a little more context on what is expected of it, whether people use it, whether it's kind of abandoned, or what; I'll email the dev list to discuss in more detail. to quickly address your comment, the existence of e.g. [{{ExecutorSource}}|https://github.com/apache/spark/blob/b6aa557300275b835cce7baa7bc8a80eb5425cbb/core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala] and [{{WorkerSource}}|https://github.com/apache/spark/blob/b6aa557300275b835cce7baa7bc8a80eb5425cbb/core/src/main/scala/org/apache/spark/deploy/worker/WorkerSource.scala] to imply that the system should provide ways to view metrics from non-local nodes. Am I misreading? Is anyone familiar with / in charge of this code? Do people use it in lieu of the web UI? > Let metrics.properties file take an hdfs:// path > > > Key: SPARK-5152 > URL: https://issues.apache.org/jira/browse/SPARK-5152 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ryan Williams > > From my reading of [the > code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], > the {{spark.metrics.conf}} property must be a path that is resolvable on the > local filesystem of each executor. > Running a Spark job with {{--conf > spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs > many errors (~1 per executor, presumably?) like: > {code} > 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file > java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > at java.io.FileInputStream.(FileInputStream.java:101) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53) > at > org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > {code} > which seems consistent with the idea that it's looking on the local > filesystem and not parsing the "scheme" portion of the URL. > Letting all executors get their {{metrics.properties}} files from one > location on HDFS would be an improvement, right? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5152) Let metrics.properties file take an hdfs:// path
[ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270616#comment-14270616 ] Patrick Wendell commented on SPARK-5152: Should we be loading the metrics properties on executors in the first place? Maybe that's the issue. Since executors are ephemeral you can't query them for any metrics anyways, right? > Let metrics.properties file take an hdfs:// path > > > Key: SPARK-5152 > URL: https://issues.apache.org/jira/browse/SPARK-5152 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ryan Williams > > From my reading of [the > code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], > the {{spark.metrics.conf}} property must be a path that is resolvable on the > local filesystem of each executor. > Running a Spark job with {{--conf > spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs > many errors (~1 per executor, presumably?) like: > {code} > 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file > java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:146) > at java.io.FileInputStream.(FileInputStream.java:101) > at > org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53) > at > org.apache.spark.metrics.MetricsSystem.(MetricsSystem.scala:92) > at > org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > {code} > which seems consistent with the idea that it's looking on the local > filesystem and not parsing the "scheme" portion of the URL. > Letting all executors get their {{metrics.properties}} files from one > location on HDFS would be an improvement, right? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org