Unsubscribe

2023-07-30 Thread Parag Chaudhari
Unsubscribe


*Thanks,Parag Chaudhari*


How to make sure that function is executed on each active executor?

2019-06-26 Thread Parag Chaudhari
Hi,

I am working on some use case where I want to perform some action on each
active executor of application once. How to run some function on each
active executor associated with current spark application?

num_executors = len(self._jsc.sc().statusTracker().getExecutorInfos()) - 1
if num_executors > 0:
dummyRDD = self.parallelize(range(num_executors), num_executors)
dummyRDD.foreachPartition(functionfoo)

Will it guarantee that function foo will be executed on each active
executor? Or will it miss few executors if there are more than 1 core per
executor?

Deeply appreciate help and time.


*Thanks,Parag Chaudhari*


Re: Why spark history server does not show RDD even if it is persisted?

2017-03-01 Thread Parag Chaudhari
Thanks!



*Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
*Mobile : (213)-572-7858*
*Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
<http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*


On Tue, Feb 28, 2017 at 12:53 PM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:

> The REST APIs are not just for Spark history server. When an application
> is running, you can use the REST APIs to talk to Spark UI HTTP server as
> well.
>
> On Tue, Feb 28, 2017 at 10:46 AM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> ping...
>>
>>
>>
>> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
>> *Mobile : (213)-572-7858 <(213)%20572-7858>*
>> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
>> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>>
>>
>> On Wed, Feb 22, 2017 at 7:54 PM, Parag Chaudhari <paragp...@gmail.com>
>> wrote:
>>
>>> Thanks!
>>>
>>> If spark does not log these events in event log then why spark history
>>> server provides an API to get RDD information?
>>>
>>> From the documentation,
>>>
>>> /applications/[app-id]/storage/rdd   A list of stored RDDs for the
>>> given application.
>>>
>>> /applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
>>> status of a given RDD.
>>>
>>>
>>>
>>>
>>> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
>>> *Mobile : (213)-572-7858 <(213)%20572-7858>*
>>> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
>>> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>>>
>>>
>>> On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>> wrote:
>>>
>>>> It is too verbose, and will significantly increase the size event log.
>>>>
>>>> Here is the comment in the code:
>>>>
>>>> // No-op because logging every update would be overkill
>>>>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>>>>
>>>>>
>>>> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks a lot the information!
>>>>>
>>>>> Is there any reason why EventLoggingListener ignore this event?
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>>
>>>>> *​Parag​*
>>>>>
>>>>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>>>>> will not be written into event-log, I think that's why you cannot get 
>>>>>> such
>>>>>> info in history server.
>>>>>>
>>>>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>>>>
>>>>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>>>>> myrdd.setName("test")
>>>>>>> myrdd.cache
>>>>>>> myrdd.collect
>>>>>>>
>>>>>>> But I am not able to see any RDD info in "storage" tab in spark
>>>>>>> history server.
>>>>>>>
>>>>>>> I looked at this
>>>>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>>>>> but it is not helping as I have exact similar program mentioned there. 
>>>>>>> Can
>>>>>>> anyone help?
>>>>>>>
>>>>>>>
>>>>>>> *Thanks,*
>>>>>>>
>>>>>>> *​Parag​*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Is there any limit on number of tasks per stage attempt?

2017-02-28 Thread Parag Chaudhari
Thanks Jacek!


*Thanks,Parag*


On Fri, Feb 24, 2017 at 10:45 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Think it's the size of the type to count the partitions which I think is
> Int. I don't think there's another reason.
>
> Jacek
>
> On 23 Feb 2017 5:01 a.m., "Parag Chaudhari" <paragp...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there any limit on number of tasks per stage attempt?
>>
>>
>> *Thanks,*
>>
>> *​Parag​*
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-28 Thread Parag Chaudhari
ping...



*Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
*Mobile : (213)-572-7858*
*Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
<http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*


On Wed, Feb 22, 2017 at 7:54 PM, Parag Chaudhari <paragp...@gmail.com>
wrote:

> Thanks!
>
> If spark does not log these events in event log then why spark history
> server provides an API to get RDD information?
>
> From the documentation,
>
> /applications/[app-id]/storage/rdd   A list of stored RDDs for the given
> application.
>
> /applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
> status of a given RDD.
>
>
>
>
> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
> *Mobile : (213)-572-7858 <(213)%20572-7858>*
> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>
>
> On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> It is too verbose, and will significantly increase the size event log.
>>
>> Here is the comment in the code:
>>
>> // No-op because logging every update would be overkill
>>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>>
>>>
>> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
>> wrote:
>>
>>> Thanks a lot the information!
>>>
>>> Is there any reason why EventLoggingListener ignore this event?
>>>
>>> *Thanks,*
>>>
>>>
>>> *​Parag​*
>>>
>>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>> wrote:
>>>
>>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>>> will not be written into event-log, I think that's why you cannot get such
>>>> info in history server.
>>>>
>>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>>
>>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>>> myrdd.setName("test")
>>>>> myrdd.cache
>>>>> myrdd.collect
>>>>>
>>>>> But I am not able to see any RDD info in "storage" tab in spark
>>>>> history server.
>>>>>
>>>>> I looked at this
>>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>>> but it is not helping as I have exact similar program mentioned there. Can
>>>>> anyone help?
>>>>>
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>> *​Parag​*
>>>>>
>>>>
>>>>
>>>
>>
>


Is there any limit on number of tasks per stage attempt?

2017-02-22 Thread Parag Chaudhari
Hi,

Is there any limit on number of tasks per stage attempt?


*Thanks,*

*​Parag​*


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari
Thanks!

If spark does not log these events in event log then why spark history
server provides an API to get RDD information?

>From the documentation,

/applications/[app-id]/storage/rdd   A list of stored RDDs for the given
application.

/applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
status of a given RDD.




*Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
*Mobile : (213)-572-7858*
*Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
<http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*


On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> It is too verbose, and will significantly increase the size event log.
>
> Here is the comment in the code:
>
> // No-op because logging every update would be overkill
>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>
>>
> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> Thanks a lot the information!
>>
>> Is there any reason why EventLoggingListener ignore this event?
>>
>> *Thanks,*
>>
>>
>> *​Parag​*
>>
>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>> wrote:
>>
>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>> will not be written into event-log, I think that's why you cannot get such
>>> info in history server.
>>>
>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>
>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>> myrdd.setName("test")
>>>> myrdd.cache
>>>> myrdd.collect
>>>>
>>>> But I am not able to see any RDD info in "storage" tab in spark history
>>>> server.
>>>>
>>>> I looked at this
>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>> but it is not helping as I have exact similar program mentioned there. Can
>>>> anyone help?
>>>>
>>>>
>>>> *Thanks,*
>>>>
>>>> *​Parag​*
>>>>
>>>
>>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari
Thanks a lot the information!

Is there any reason why EventLoggingListener ignore this event?

*Thanks,*


*​Parag​*

On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will
> not be written into event-log, I think that's why you cannot get such info
> in history server.
>
> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am running spark shell in spark version 2.0.2. Here is my program,
>>
>> var myrdd = sc.parallelize(Array.range(1, 10))
>> myrdd.setName("test")
>> myrdd.cache
>> myrdd.collect
>>
>> But I am not able to see any RDD info in "storage" tab in spark history
>> server.
>>
>> I looked at this
>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>> but it is not helping as I have exact similar program mentioned there. Can
>> anyone help?
>>
>>
>> *Thanks,*
>>
>> *​Parag​*
>>
>
>


Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari
Hi,

I am running spark shell in spark version 2.0.2. Here is my program,

var myrdd = sc.parallelize(Array.range(1, 10))
myrdd.setName("test")
myrdd.cache
myrdd.collect

But I am not able to see any RDD info in "storage" tab in spark history
server.

I looked at this

but it is not helping as I have exact similar program mentioned there. Can
anyone help?


*Thanks,*

*​Parag​*