Re: Spark Profiler

2019-03-26 Thread manish ranjan
I have found ganglia very helpful in understanding network I/o , CPU and
memory usage  for a given spark cluster.
I have not used , but have heard good things about Dr Elephant ( which I
think was contributed by LinkedIn but not 100%sure).

On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis  wrote:

> Hello all,
>
>  I am looking for a spark profiler to trace my application to find
> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>
> I am looking forward for your reply.
>
> --Iacovos
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark Tuning Tool

2018-01-23 Thread manish ranjan
This is awesome work Rohit. Not only as a user, but I will be also super
interested in contributing to solving this pain point of my daily work.

Manish

~Manish



On Mon, Jan 22, 2018 at 9:21 PM, lucas.g...@gmail.com 
wrote:

> I'd be very interested in anything I can send to my analysts to assist
> them with their troubleshooting / optimization... Of course our engineers
> would appreciate it as well.
>
> However we'd be way more interested if it was OSS.
>
> Thanks!
>
> Gary Lucas
>
> On 22 January 2018 at 21:16, Holden Karau  wrote:
>
>> That's very interesting, and might also get some interest on the dev@
>> list if it was open source.
>>
>> On Tue, Jan 23, 2018 at 4:02 PM, Roger Marin 
>> wrote:
>>
>>> I'd be very interested.
>>>
>>> On 23 Jan. 2018 4:01 pm, "Rohit Karlupia"  wrote:
>>>
 Hi,

 I have been working on making the performance tuning of spark
 applications bit easier.  We have just released the beta version of the
 tool on Qubole.

 https://www.qubole.com/blog/introducing-quboles-spark-tuning-tool/

 This is not OSS yet but we would like to contribute it to OSS.  Fishing
 for some interest in the community if people find this work interesting and
 would like to try to it out.

 thanks,
 Rohit Karlupia



>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>


Re: Monitoring the User Metrics for a long running Spark Job

2016-12-05 Thread manish ranjan
http://spark.apache.org/docs/latest/monitoring.html

You can even install tools like  dstat
, iostat
, and iotop
, *collectd*  can provide fine-grained
profiling on individual nodes.

If you are using Mesos as Resource Manager , mesos exposes metrics as well
for the running job.

Manish

~Manish



On Mon, Dec 5, 2016 at 4:17 PM, Chawla,Sumit  wrote:

> Hi All
>
> I have a long running job which takes hours and hours to process data.
> How can i monitor the operational efficency of this job?  I am interested
> in something like Storm\Flink style User metrics/aggregators, which i can
> monitor while my job is running.  Using these metrics i want to monitor,
> per partition performance in processing items.  As of now, only way for me
> to get these metrics is when the job finishes.
>
> One possibility is that spark can flush the metrics to external system
> every few seconds, and thus use  an external system to monitor these
> metrics.  However, i wanted to see if the spark supports any such use case
> OOB.
>
>
> Regards
> Sumit Chawla
>
>


Re: Spark Website

2016-07-13 Thread manish ranjan
working for me. What do you mean 'as supposed to'?

~Manish



On Wed, Jul 13, 2016 at 11:45 AM, Benjamin Kim  wrote:

> Has anyone noticed that the spark.apache.org is not working as supposed
> to?
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


queup jobs in spark cluster

2015-09-26 Thread manish ranjan
Dear All,

I have a small spark cluster for academia purpose and would like it to be
open to accept jobs for set of friends
where all of us can submit and queue up jobs.

How is that possible ?  What is solution of this problem ? Any blog/sw/
link will be very helpful.

Thanks
~Manish


Need clarification on spark on cluster set up instruction

2015-06-29 Thread manish ranjan
Hi All

here goes my first question :
Here is my use case

I have 1TB data I want to process on ec2 using spark
I have uploaded the data on ebs volume
The instruction on amazon ec2 set up explains
*If your application needs to access large datasets, the fastest way to do
that is to load them from Amazon S3 or an Amazon EBS device into an
instance of the Hadoop Distributed File System (HDFS) on your nodes*

Now the new amazon instances don't have any physical volume
http://aws.amazon.com/ec2/instance-types/

So do I need to do a set up for HDFS separately  on ec2 (instruction also
says The spark-ec2 script already sets up a HDFS instance for you) ? Any
blog/write up which can help me understanding this better ?

~Manish