Re: Monitoring Spark application progress
hello use google translate and https://mkdev.me/posts/ci-i-monitoring-spark-prilozheniy On Mon, May 16, 2016 at 6:13 PM, Ashok Kumarwrote: > Hi, > > I would like to know the approach and tools please to get the full > performance for a Spark app running through Spark-shell and Spark-sumbit > > >1. Through Spark GUI at 4040? >2. Through OS utilities top, SAR >3. Through Java tools like jbuilder etc >4. Through integration Spark with monitoring tools. > > > Thanks >
Re: Monitoring Spark application progress
spark + zabbix + jmx https://translate.google.ru/translate?sl=ru=en=y=_t=en=UTF-8=https%3A%2F%2Fmkdev.me%2Fposts%2Fci-i-monitoring-spark-prilozheniy= On Mon, May 16, 2016 at 6:13 PM, Ashok Kumarwrote: > Hi, > > I would like to know the approach and tools please to get the full > performance for a Spark app running through Spark-shell and Spark-sumbit > > >1. Through Spark GUI at 4040? >2. Through OS utilities top, SAR >3. Through Java tools like jbuilder etc >4. Through integration Spark with monitoring tools. > > > Thanks >
Re: Monitoring Spark HDFS Reads and Writes
> On 30 Dec 2015, at 13:19, alvarobrandonwrote: > > Hello: > > Is there anyway of monitoring the number of Bytes or blocks read and written > by an Spark application?. I'm running Spark with YARN and I want to measure > how I/O intensive a set of applications are. Closest thing I have seen is > the HDFS DataNode Logs in YARN but they don't seem to have Spark > applications specific reads and writes. > > 2015-12-21 18:29:15,347 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE, > cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID: > a9edc8ad-fb09-4621-b469-76de587560c0, blockid: > BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration: > 2619119 > hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21 > 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d > > Is there any trace about this kind of operations to be found in any log? 1. the HDFS namenode and datanodes all collect metrics of their use, with org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most interesting on IO. 2. FileSystem.Statistics is a static structure collecting data on operations and data for each thread in a client process. 3. The HDFS input streams also supports some read statistics (ReadStatistics via getReadReadStatistics) 4. the recent versions of HDFS are also adding htrace support, to trace end-to-end performance. I'd start with FileSystem.Statistics; if that's not being collected across spark jobs, it should be possible - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Monitoring Spark HDFS Reads and Writes
Hello, Spark collect HDFS read/write metrics per application/job see details http://spark.apache.org/docs/latest/monitoring.html. I have connected spark metrics to Graphite and then doing nice graphs display on Graphana. BR, Arek On Thu, Dec 31, 2015 at 2:00 PM, Steve Loughranwrote: > >> On 30 Dec 2015, at 13:19, alvarobrandon wrote: >> >> Hello: >> >> Is there anyway of monitoring the number of Bytes or blocks read and written >> by an Spark application?. I'm running Spark with YARN and I want to measure >> how I/O intensive a set of applications are. Closest thing I have seen is >> the HDFS DataNode Logs in YARN but they don't seem to have Spark >> applications specific reads and writes. >> >> 2015-12-21 18:29:15,347 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE, >> cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID: >> a9edc8ad-fb09-4621-b469-76de587560c0, blockid: >> BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration: >> 2619119 >> hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21 >> 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d >> >> Is there any trace about this kind of operations to be found in any log? > > > 1. the HDFS namenode and datanodes all collect metrics of their use, with > org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most > interesting on IO. > 2. FileSystem.Statistics is a static structure collecting data on operations > and data for each thread in a client process. > 3. The HDFS input streams also supports some read statistics (ReadStatistics > via getReadReadStatistics) > 4. the recent versions of HDFS are also adding htrace support, to trace > end-to-end performance. > > I'd start with FileSystem.Statistics; if that's not being collected across > spark jobs, it should be possible > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Monitoring Spark Jobs
Hi Sam, You might want to have a look at spark UI which runs by default at localhost://8080. You can also configure Apache Ganglia to monitor over your cluster resources. Thank you Regards Himanshu Mehra -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193p23243.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Monitoring Spark Jobs
Hi Sam, Have a look at Sematext's SPM for your Spark monitoring needs. If the problem is CPU, IO, Network, etc. as Ahkil mentioned, you'll see that in SPM, too. As for the number of jobs running, you have see a chart with that at http://sematext.com/spm/integrations/spark-monitoring.html Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Sun, Jun 7, 2015 at 6:37 AM, SamyaMaiti samya.maiti2...@gmail.com wrote: Hi All, I have a Spark SQL application to fetch data from Hive, on top I have a akka layer to run multiple Queries in parallel. *Please suggest a mechanism, so as to figure out the number of spark jobs running in the cluster at a given instance of time. * I need to do the above as, I see the average response time increasing with increase in number of requests, in-spite of increasing the number of cores in the cluster. I suspect there is a bottleneck somewhere else. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Monitoring Spark Jobs
It could be a CPU, IO, Network bottleneck, you need to figure out where exactly its chocking. You can use certain monitoring utilities (like top) to understand it better. Thanks Best Regards On Sun, Jun 7, 2015 at 4:07 PM, SamyaMaiti samya.maiti2...@gmail.com wrote: Hi All, I have a Spark SQL application to fetch data from Hive, on top I have a akka layer to run multiple Queries in parallel. *Please suggest a mechanism, so as to figure out the number of spark jobs running in the cluster at a given instance of time. * I need to do the above as, I see the average response time increasing with increase in number of requests, in-spite of increasing the number of cores in the cluster. I suspect there is a bottleneck somewhere else. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Monitoring Spark with Graphite and Grafana
Cool, great job☺. Thanks Jerry From: Ryan Williams [mailto:ryan.blake.willi...@gmail.com] Sent: Thursday, February 26, 2015 6:11 PM To: user; d...@spark.apache.org Subject: Monitoring Spark with Graphite and Grafana If anyone is curious to try exporting Spark metrics to Graphite, I just published a post about my experience doing that, building dashboards in Grafanahttp://grafana.org/, and using them to monitor Spark jobs: http://www.hammerlab.org/2015/02/27/monitoring-spark-with-graphite-and-grafana/ Code for generating Grafana dashboards tailored to the metrics emitted by Spark is here: https://github.com/hammerlab/grafana-spark-dashboards. If anyone else is interested in working on expanding MetricsSystem to make this sort of thing more useful, let me know, I've been working on it a fair amount and have a bunch of ideas about where it should go. Thanks, -Ryan
Re: Monitoring Spark
If you're only interested in a particular instant, a simpler way is to check the executors page on the Spark UI: http://spark.apache.org/docs/latest/monitoring.html. By default each executor runs one task per core, so you can see how many tasks are being run at a given time and this translates directly to how many cores are being used for execution. 2014-12-02 21:49 GMT-08:00 Otis Gospodnetic otis.gospodne...@gmail.com: Hi Isca, I think SPM can do that for you: http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz pop1...@gmail.com wrote: hello, im running spark on a cluster and i want to monitor how many nodes/ cores are active in different (specific) points of the program. is there any way to do this? thanks, Isca
Re: Monitoring Spark
Are you running Spark in Local or Standalone mode? In either mode, you should be able to hit port 4040 (to see the Spark Jobs/Stages/Storage/Executors UI) on the machine where the driver is running. However, in local mode, you won't have a Spark Master UI on 7080 or a Worker UI on 7081. You can manually set the Spark Stages UI port to something other than 4040 (in case there are conflicts) with the spark.ui.port setting. Also, after setting the evengLog.enabled to true, you may also want to specificy the spark.eventLog.dir to a globally visible filesystem like HDFS (unless you're running in local mode). On Wed, Dec 3, 2014 at 10:01 AM, Isca Harmatz pop1...@gmail.com wrote: hello, im running spark on stand alone station and im try to view the event log after the run is finished i turned on the event log as the site said (spark.eventLog.enabled set to true) but i can't find the log files or get the web ui to work. any idea on how to do this? thanks Isca
Re: Monitoring Spark
Hi Isca, I think SPM can do that for you: http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz pop1...@gmail.com wrote: hello, im running spark on a cluster and i want to monitor how many nodes/ cores are active in different (specific) points of the program. is there any way to do this? thanks, Isca