Re: Monitoring Spark application progress

2016-05-16 Thread Василец Дмитрий
hello
use google translate and
https://mkdev.me/posts/ci-i-monitoring-spark-prilozheniy

On Mon, May 16, 2016 at 6:13 PM, Ashok Kumar 
wrote:

> Hi,
>
> I would like to know the approach and tools please to get the full
> performance for a Spark app running through Spark-shell and Spark-sumbit
>
>
>1. Through Spark GUI at 4040?
>2. Through OS utilities top, SAR
>3. Through Java tools like jbuilder etc
>4. Through integration Spark with monitoring tools.
>
>
> Thanks
>


Re: Monitoring Spark application progress

2016-05-16 Thread Василец Дмитрий
spark + zabbix + jmx
https://translate.google.ru/translate?sl=ru=en=y=_t=en=UTF-8=https%3A%2F%2Fmkdev.me%2Fposts%2Fci-i-monitoring-spark-prilozheniy=

On Mon, May 16, 2016 at 6:13 PM, Ashok Kumar 
wrote:

> Hi,
>
> I would like to know the approach and tools please to get the full
> performance for a Spark app running through Spark-shell and Spark-sumbit
>
>
>1. Through Spark GUI at 4040?
>2. Through OS utilities top, SAR
>3. Through Java tools like jbuilder etc
>4. Through integration Spark with monitoring tools.
>
>
> Thanks
>


Re: Monitoring Spark HDFS Reads and Writes

2015-12-31 Thread Steve Loughran

> On 30 Dec 2015, at 13:19, alvarobrandon  wrote:
> 
> Hello:
> 
> Is there anyway of monitoring the number of Bytes or blocks read and written
> by an Spark application?. I'm running Spark with YARN and I want to measure
> how I/O intensive a set of applications are. Closest thing I have seen is
> the HDFS DataNode Logs in YARN but they don't seem to have Spark
> applications specific reads and writes.
> 
> 2015-12-21 18:29:15,347 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE,
> cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID:
> a9edc8ad-fb09-4621-b469-76de587560c0, blockid:
> BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration:
> 2619119
> hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21
> 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d
> 
> Is there any trace about this kind of operations to be found in any log?


1. the HDFS namenode and datanodes all collect metrics of their use, with 
org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most 
interesting on IO.
2. FileSystem.Statistics is a static structure collecting data on operations 
and data for each thread in a client process.
3. The HDFS input streams also supports some read statistics (ReadStatistics 
via getReadReadStatistics)
4. the recent versions of HDFS are also adding htrace support, to trace 
end-to-end performance.

I'd start with FileSystem.Statistics; if that's not being collected across 
spark jobs, it should be possible


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Monitoring Spark HDFS Reads and Writes

2015-12-31 Thread Arkadiusz Bicz
Hello,

Spark collect HDFS read/write metrics per application/job see details
http://spark.apache.org/docs/latest/monitoring.html.

I have connected spark metrics to Graphite and then doing nice graphs
display on Graphana.

BR,

Arek

On Thu, Dec 31, 2015 at 2:00 PM, Steve Loughran  wrote:
>
>> On 30 Dec 2015, at 13:19, alvarobrandon  wrote:
>>
>> Hello:
>>
>> Is there anyway of monitoring the number of Bytes or blocks read and written
>> by an Spark application?. I'm running Spark with YARN and I want to measure
>> how I/O intensive a set of applications are. Closest thing I have seen is
>> the HDFS DataNode Logs in YARN but they don't seem to have Spark
>> applications specific reads and writes.
>>
>> 2015-12-21 18:29:15,347 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE,
>> cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID:
>> a9edc8ad-fb09-4621-b469-76de587560c0, blockid:
>> BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration:
>> 2619119
>> hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21
>> 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d
>>
>> Is there any trace about this kind of operations to be found in any log?
>
>
> 1. the HDFS namenode and datanodes all collect metrics of their use, with 
> org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most 
> interesting on IO.
> 2. FileSystem.Statistics is a static structure collecting data on operations 
> and data for each thread in a client process.
> 3. The HDFS input streams also supports some read statistics (ReadStatistics 
> via getReadReadStatistics)
> 4. the recent versions of HDFS are also adding htrace support, to trace 
> end-to-end performance.
>
> I'd start with FileSystem.Statistics; if that's not being collected across 
> spark jobs, it should be possible
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Monitoring Spark Jobs

2015-06-10 Thread Himanshu Mehra
Hi Sam,

You might want to have a look at spark UI which runs by default at
localhost://8080. You can also configure Apache Ganglia to monitor over your
cluster resources. 

Thank you
Regards
Himanshu Mehra



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193p23243.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Monitoring Spark Jobs

2015-06-07 Thread Otis Gospodnetić
Hi Sam,

Have a look at Sematext's SPM for your Spark monitoring needs. If the
problem is CPU, IO, Network, etc. as Ahkil mentioned, you'll see that in
SPM, too.
As for the number of jobs running, you have see a chart with that at
http://sematext.com/spm/integrations/spark-monitoring.html

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Sun, Jun 7, 2015 at 6:37 AM, SamyaMaiti samya.maiti2...@gmail.com
wrote:

 Hi All,

 I have a Spark SQL application to fetch data from Hive, on top I have a
 akka
 layer to run multiple Queries in parallel.

 *Please suggest a mechanism, so as to figure out the number of spark jobs
 running in the cluster at a given instance of time. *

 I need to do the above as, I see the average response time increasing with
 increase in number of requests, in-spite of increasing the number of cores
 in the cluster. I suspect there is a bottleneck somewhere else.

 Regards,
 Sam



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Monitoring Spark Jobs

2015-06-07 Thread Akhil Das
It could be a CPU, IO, Network bottleneck, you need to figure out where
exactly its chocking. You can use certain monitoring utilities (like top)
to understand it better.

Thanks
Best Regards

On Sun, Jun 7, 2015 at 4:07 PM, SamyaMaiti samya.maiti2...@gmail.com
wrote:

 Hi All,

 I have a Spark SQL application to fetch data from Hive, on top I have a
 akka
 layer to run multiple Queries in parallel.

 *Please suggest a mechanism, so as to figure out the number of spark jobs
 running in the cluster at a given instance of time. *

 I need to do the above as, I see the average response time increasing with
 increase in number of requests, in-spite of increasing the number of cores
 in the cluster. I suspect there is a bottleneck somewhere else.

 Regards,
 Sam



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: Monitoring Spark with Graphite and Grafana

2015-02-26 Thread Shao, Saisai
Cool, great job☺.

Thanks
Jerry

From: Ryan Williams [mailto:ryan.blake.willi...@gmail.com]
Sent: Thursday, February 26, 2015 6:11 PM
To: user; d...@spark.apache.org
Subject: Monitoring Spark with Graphite and Grafana

If anyone is curious to try exporting Spark metrics to Graphite, I just 
published a post about my experience doing that, building dashboards in 
Grafanahttp://grafana.org/, and using them to monitor Spark jobs: 
http://www.hammerlab.org/2015/02/27/monitoring-spark-with-graphite-and-grafana/

Code for generating Grafana dashboards tailored to the metrics emitted by Spark 
is here: https://github.com/hammerlab/grafana-spark-dashboards.

If anyone else is interested in working on expanding MetricsSystem to make this 
sort of thing more useful, let me know, I've been working on it a fair amount 
and have a bunch of ideas about where it should go.

Thanks,

-Ryan




Re: Monitoring Spark

2014-12-05 Thread Andrew Or
If you're only interested in a particular instant, a simpler way is to
check the executors page on the Spark UI:
http://spark.apache.org/docs/latest/monitoring.html. By default each
executor runs one task per core, so you can see how many tasks are being
run at a given time and this translates directly to how many cores are
being used for execution.

2014-12-02 21:49 GMT-08:00 Otis Gospodnetic otis.gospodne...@gmail.com:

 Hi Isca,

 I think SPM can do that for you:
 http://blog.sematext.com/2014/10/07/apache-spark-monitoring/

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/


 On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz pop1...@gmail.com wrote:

 hello,

 im running spark on a cluster and i want to monitor how many nodes/ cores
 are active in different (specific) points of the program.

 is there any way to do this?

 thanks,
   Isca





Re: Monitoring Spark

2014-12-04 Thread Sameer Farooqui
Are you running Spark in Local or Standalone mode? In either mode, you
should be able to hit port 4040 (to see the Spark
Jobs/Stages/Storage/Executors UI) on the machine where the driver is
running. However, in local mode, you won't have a Spark Master UI on 7080
or a Worker UI on 7081.

You can manually set the Spark Stages UI port to something other than 4040
(in case there are conflicts) with the spark.ui.port setting.

Also, after setting the evengLog.enabled to true, you may also want to
specificy the spark.eventLog.dir to a globally visible filesystem like HDFS
(unless you're running in local mode).

On Wed, Dec 3, 2014 at 10:01 AM, Isca Harmatz pop1...@gmail.com wrote:

 hello,

 im running spark on stand alone station and im try to view the event log
 after the run is finished
 i turned on the event log as the site said (spark.eventLog.enabled set to
 true)

 but i can't find the log files or get the web ui to work. any idea on how
 to do this?

 thanks
Isca




Re: Monitoring Spark

2014-12-02 Thread Otis Gospodnetic
Hi Isca,

I think SPM can do that for you:
http://blog.sematext.com/2014/10/07/apache-spark-monitoring/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz pop1...@gmail.com wrote:

 hello,

 im running spark on a cluster and i want to monitor how many nodes/ cores
 are active in different (specific) points of the program.

 is there any way to do this?

 thanks,
   Isca