Re: Spark metrics when running with YARN?

2016-08-30 Thread Otis Gospodnetić
Hi Mich and Vijay,

Thanks!  I forgot to include an important bit - I'm looking for a
*programmatic* way to get Spark metrics when running Spark under YARN - so
JMX or API of some kind.

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Aug 30, 2016 at 6:59 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Spark UI regardless of deployment mode Standalone, yarn etc runs on port
> 4040 by default that can be accessed directly
>
> Otherwise one can specify a specific port with --conf "spark.ui.port=5"
> for example 5
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 30 August 2016 at 11:48, Vijay Kiran <m...@vijaykiran.com> wrote:
>
>>
>> From Yarm RM UI, find the spark application Id, and in the application
>> details, you can click on the “Tracking URL” which should give you the
>> Spark UI.
>>
>> ./Vijay
>>
>> > On 30 Aug 2016, at 07:53, Otis Gospodnetić <otis.gospodne...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > When Spark is run on top of YARN, where/how can one get Spark metrics?
>> >
>> > Thanks,
>> > Otis
>> > --
>> > Monitoring - Log Management - Alerting - Anomaly Detection
>> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> >
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Spark metrics when running with YARN?

2016-08-29 Thread Otis Gospodnetić
Hi,

When Spark is run on top of YARN, where/how can one get Spark metrics?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


Re: Spark Executor Metrics

2016-08-16 Thread Otis Gospodnetić
Hi Muhammad,

You should give people a bit more time to answer/help you (for free). :)

I don't have direct answer for you, but you can look at SPM for Spark
, which has
all the instructions for getting all Spark metrics (Executors, etc.) into
SPM.  It doesn't involve sink.csv stuff.

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Aug 16, 2016 at 11:21 AM, Muhammad Haris <
muhammad.haris.makh...@gmail.com> wrote:

> Still waiting for response, any clue/suggestions?
>
>
> On Tue, Aug 16, 2016 at 4:48 PM, Muhammad Haris <
> muhammad.haris.makh...@gmail.com> wrote:
>
>> Hi,
>> I have been trying to collect driver, master, worker and executors
>> metrics using Spark 2.0 in standalone mode, here is what my metrics
>> configuration file looks like:
>>
>> *.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
>> *.sink.csv.period=1
>> *.sink.csv.unit=seconds
>> *.sink.csv.directory=/root/metrics/
>> executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
>> master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
>> worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
>> driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
>>
>> Once application is complete, i can only see driver's metrics, have
>> checked directories on all the worker nodes as well.
>> Could anybody please help me what's i am doing wrong here.
>>
>>
>>
>> Regards
>>
>>
>>
>


Re: Apache Flink

2016-04-17 Thread Otis Gospodnetić
While Flink may not be younger than Spark, Spark came to Apache first,
which always helps.  Plus, there was already a lot of buzz around Spark
before it came to Apache.  Coming from Berkeley also helps.

That said, Flink seems decently healthy to me:
- http://search-hadoop.com/?fc_project=Flink_type=mail+_hash_+user=
- http://search-hadoop.com/?fc_project=Flink_type=mail+_hash_+dev=
-
http://search-hadoop.com/?fc_project=Flink_type=issue==144547200=146102400

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Sun, Apr 17, 2016 at 5:55 PM, Mich Talebzadeh 
wrote:

> Assuming that both Spark and Flink are contemporaries what are the reasons
> that Flink has not been adopted widely? (this may sound obvious and or
> prejudged). I mean Spark has surged in popularity in the past year if I am
> correct
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 22:49, Michael Malak  wrote:
>
>> In terms of publication date, a paper on Nephele was published in 2009,
>> prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of
>> Stratosphere, which became Flink.
>>
>>
>> --
>> *From:* Mark Hamstra 
>> *To:* Mich Talebzadeh 
>> *Cc:* Corey Nolet ; "user @spark" <
>> user@spark.apache.org>
>> *Sent:* Sunday, April 17, 2016 3:30 PM
>> *Subject:* Re: Apache Flink
>>
>> To be fair, the Stratosphere project from which Flink springs was started
>> as a collaborative university research project in Germany about the same
>> time that Spark was first released as Open Source, so they are near
>> contemporaries rather than Flink having been started only well after Spark
>> was an established and widely-used Apache project.
>>
>> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> Also it always amazes me why they are so many tangential projects in Big
>> Data space? Would not it be easier if efforts were spent on adding to Spark
>> functionality rather than creating a new product like Flink?
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 17 April 2016 at 21:08, Mich Talebzadeh 
>> wrote:
>>
>> Thanks Corey for the useful info.
>>
>> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
>> However, there does not seem to be anything close to these products in
>> Hadoop Ecosystem. So I guess there is nothing there?
>>
>> Regards.
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>>
>> i have not been intrigued at all by the microbatching concept in Spark. I
>> am used to CEP in real streams processing environments like Infosphere
>> Streams & Storm where the granularity of processing is at the level of each
>> individual tuple and processing units (workers) can react immediately to
>> events being received and processed. The closest Spark streaming comes to
>> this concept is the notion of "state" that that can be updated via the
>> "updateStateBykey()" functions which are only able to be run in a
>> microbatch. Looking at the expected design changes to Spark Streaming in
>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>> the radar for Spark, though I have seen articles stating that more effort
>> is going to go into the Spark SQL layer in Spark streaming which may make
>> it more reminiscent of Esper.
>>
>> For these reasons, I have not even tried to implement CEP in Spark. I
>> feel it's a waste of time without immediate tuple-at-a-time processing.
>> Without this, they avoid the whole problem of "back pressure" (though keep
>> in mind, it is still very possible to overload the Spark streaming layer
>> with stages that will continue to pile up and never get worked off) but
>> they lose the granular control that you get in CEP environments by allowing
>> the rules & processors to react with the receipt of each tuple, right away.
>>
>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>> [1] on top of Apache Storm as an example of what such a design may look
>> like. It looks like Storm is going to be replaced in the 

Re: Monitoring tools for spark streaming

2015-09-29 Thread Otis Gospodnetić
Hi,

There's also SPM for Spark --
http://sematext.com/spm/integrations/spark-monitoring.html

SPM graphs all Spark metrics and gives you alerting, anomaly detection,
etc. and if you ship your Spark and/or other logs to Logsene -
http://sematext.com/logsene - you can correlate metrics, logs, errors,
etc.  I haven't used SPM with Spark after "AppMap" was introduced (see
http://blog.sematext.com/2015/08/06/introducing-appmap/ ) but I imagine it
would be nice to see a map of Spark nodes talking to each other.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Sep 28, 2015 at 7:52 PM, Siva  wrote:

> Hi,
>
> Could someone recommend the monitoring tools for spark streaming?
>
> By extending StreamingListener we can dump the delay in processing of
> batches and some alert messages.
>
> But are there any Web UI tools where we can monitor failures, see delays
> in processing, error messages and setup alerts etc.
>
> Thanks
>
>


Replacing Esper with Spark Streaming?

2015-09-13 Thread Otis Gospodnetić
Hi,

I'm wondering if anyone has attempted to replace Esper with Spark Streaming
or if anyone thinks Spark Streaming is/isn't a good tool for the (CEP) job?

We are considering Akka or Spark Streaming as possible Esper replacements
and would appreciate any input from people who tried to do that with either
of them.

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Registering custom metrics

2015-06-23 Thread Otis Gospodnetić
Hi,

Not sure if this will fit your needs, but if you are trying to
collect+chart some metrics specific to your app, yet want to correlate them
with what's going on in Spark, maybe Spark's performance numbers, you may
want to send your custom metrics to SPM, so they can be
visualized/analyzed/dashboarded along with your Spark metrics. See
http://sematext.com/spm/integrations/spark-monitoring.html for the Spark
piece and https://sematext.atlassian.net/wiki/display/PUBSPM/Custom+Metrics
for Custom Metrics.  If you use Coda Hale's metrics lib, that works, too,
there is a pluggable reported that will send Coda Hale metrics to SPM, too.

HTH.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Mon, Jun 22, 2015 at 9:57 AM, dgoldenberg dgoldenberg...@gmail.com
wrote:

 Hi Gerard,

 Have there been any responses? Any insights as to what you ended up doing
 to
 enable custom metrics? I'm thinking of implementing a custom metrics sink,
 not sure how doable that is yet...

 Thanks.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Registering-custom-metrics-tp17765p23426.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Monitoring Spark Jobs

2015-06-07 Thread Otis Gospodnetić
Hi Sam,

Have a look at Sematext's SPM for your Spark monitoring needs. If the
problem is CPU, IO, Network, etc. as Ahkil mentioned, you'll see that in
SPM, too.
As for the number of jobs running, you have see a chart with that at
http://sematext.com/spm/integrations/spark-monitoring.html

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Sun, Jun 7, 2015 at 6:37 AM, SamyaMaiti samya.maiti2...@gmail.com
wrote:

 Hi All,

 I have a Spark SQL application to fetch data from Hive, on top I have a
 akka
 layer to run multiple Queries in parallel.

 *Please suggest a mechanism, so as to figure out the number of spark jobs
 running in the cluster at a given instance of time. *

 I need to do the above as, I see the average response time increasing with
 increase in number of requests, in-spite of increasing the number of cores
 in the cluster. I suspect there is a bottleneck somewhere else.

 Regards,
 Sam



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-Jobs-tp23193.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org