Re: 10hrs of Scheduler Delay

2016-01-25 Thread Ted Yu
Yes, thread dump plus log would be helpful for debugging. 

Thanks

> On Jan 25, 2016, at 5:59 AM, Sanders, Isaac B <sande...@rose-hulman.edu> 
> wrote:
> 
> Is the thread dump the stack trace you are talking about? If so, I will see 
> if I can capture the few different stages I have seen it in.
> 
> Thanks for the help, I was able to do it for 0.1% of my data. I will create 
> the JIRA.
> 
> Thanks,
> Isaac
> 
> On Jan 25, 2016, at 8:51 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
>> Opening a JIRA is fine. 
>> 
>> See if you can capture stack trace during the hung stage and attach to JIRA 
>> so that we have more clue. 
>> 
>> Thanks
>> 
>> On Jan 25, 2016, at 4:25 AM, Darren Govoni <dar...@ontrenet.com> wrote:
>> 
>>> Probably we should open a ticket for this.
>>> There's definitely a deadlock situation occurring in spark under certain 
>>> conditions.
>>> 
>>> The only clue I have is it always happens on the last stage. And it does 
>>> seem sensitive to scale. If my job has 300mb of data I'll see the deadlock. 
>>> But if I only run 10mb of it it will succeed. This suggest a serious 
>>> fundamental scaling problem.
>>> 
>>> Workers have plenty of resources.
>>> 
>>> 
>>> 
>>> Sent from my Verizon Wireless 4G LTE smartphone
>>> 
>>> 
>>>  Original message 
>>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
>>> Date: 01/24/2016 2:54 PM (GMT-05:00) 
>>> To: Renu Yadav <yren...@gmail.com> 
>>> Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar 
>>> <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org 
>>> Subject: Re: 10hrs of Scheduler Delay 
>>> 
>>> I am not getting anywhere with any of the suggestions so far. :(
>>> 
>>> Trying some more outlets, I will share any solution I find.
>>> 
>>> - Isaac
>>> 
>>>> On Jan 23, 2016, at 1:48 AM, Renu Yadav <yren...@gmail.com> wrote:
>>>> 
>>>> If you turn on spark.speculation on then that might help. it worked  for me
>>>> 
>>>>> On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni <dar...@ontrenet.com> 
>>>>> wrote:
>>>>> Thanks for the tip. I will try it. But this is the kind of thing spark is 
>>>>> supposed to figure out and handle. Or at least not get stuck forever.
>>>>> 
>>>>> 
>>>>> 
>>>>> Sent from my Verizon Wireless 4G LTE smartphone
>>>>> 
>>>>> 
>>>>>  Original message 
>>>>> From: Muthu Jayakumar <bablo...@gmail.com> 
>>>>> Date: 01/22/2016 3:50 PM (GMT-05:00) 
>>>>> To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" 
>>>>> <sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com> 
>>>>> Cc: user@spark.apache.org 
>>>>> Subject: Re: 10hrs of Scheduler Delay 
>>>>> 
>>>>> Does increasing the number of partition helps? You could try out 
>>>>> something 3 times what you currently have. 
>>>>> Another trick i used was to partition the problem into multiple 
>>>>> dataframes and run them sequentially and persistent the result and then 
>>>>> run a union on the results. 
>>>>> 
>>>>> Hope this helps. 
>>>>> 
>>>>>> On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:
>>>>>> Me too. I had to shrink my dataset to get it to work. For us at least 
>>>>>> Spark seems to have scaling issues.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Sent from my Verizon Wireless 4G LTE smartphone
>>>>>> 
>>>>>> 
>>>>>>  Original message 
>>>>>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
>>>>>> Date: 01/21/2016 11:18 PM (GMT-05:00) 
>>>>>> To: Ted Yu <yuzhih...@gmail.com> 
>>>>>> Cc: user@spark.apache.org 
>>>>>> Subject: Re: 10hrs of Scheduler Delay 
>>>>>> 
>>>>>> I have run the driver on a smaller dataset (k=2, n=5000) and it worked 
>>>>>> quickly and didn’t hang like this. This dataset is closer to k=10, 
>>>>>> n=4.

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni


Yeah. I have screenshots and stack traces. I will post them to the ticket. 
Nothing informative.
I should also mention I'm using pyspark but I think the deadlock is inside the 
Java scheduler code.



Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
Date: 01/25/2016  8:59 AM  (GMT-05:00) 
To: Ted Yu <yuzhih...@gmail.com> 
Cc: Darren Govoni <dar...@ontrenet.com>, Renu Yadav <yren...@gmail.com>, Muthu 
Jayakumar <bablo...@gmail.com>, user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 



Is the thread dump the stack trace you are talking about? If so, I will see if 
I can capture the few different stages I have seen it in.



Thanks for the help, I was able to do it for 0.1% of my data. I will create the 
JIRA.



Thanks,
Isaac


On Jan 25, 2016, at 8:51 AM, Ted Yu <yuzhih...@gmail.com> wrote:







Opening a JIRA is fine. 



See if you can capture stack trace during the hung stage and attach to JIRA so 
that we have more clue. 



Thanks


On Jan 25, 2016, at 4:25 AM, Darren Govoni <dar...@ontrenet.com> wrote:






Probably we should open a ticket for this.
There's definitely a deadlock situation occurring in spark under certain 
conditions.



The only clue I have is it always happens on the last stage. And it does seem 
sensitive to scale. If my job has 300mb of data I'll see the deadlock. But if I 
only run 10mb of it it will succeed. This suggest a serious fundamental scaling 
problem.



Workers have plenty of resources.










Sent from my Verizon Wireless 4G LTE smartphone





 Original message 

From: "Sanders, Isaac B" <sande...@rose-hulman.edu>


Date: 01/24/2016 2:54 PM (GMT-05:00) 

To: Renu Yadav <yren...@gmail.com> 

Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar <bablo...@gmail.com>, 
Ted Yu <yuzhih...@gmail.com>,
user@spark.apache.org 

Subject: Re: 10hrs of Scheduler Delay 



I am not getting anywhere with any of the suggestions so far. :(



Trying some more outlets, I will share any solution I find.



- Isaac




On Jan 23, 2016, at 1:48 AM, Renu Yadav <yren...@gmail.com> wrote:



If you turn on spark.speculation on then that might help. it worked  for me




On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni 
<dar...@ontrenet.com> wrote:



Thanks for the tip. I will try it. But this is the kind of thing spark is 
supposed to figure out and handle. Or at least not get stuck forever.











Sent from my Verizon Wireless 4G LTE smartphone





 Original message 



From: Muthu Jayakumar <bablo...@gmail.com>


Date: 01/22/2016 3:50 PM (GMT-05:00) 

To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" 
<sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com>


Cc: user@spark.apache.org


Subject: Re: 10hrs of Scheduler Delay 



Does increasing the number of partition helps? You could try out something 3 
times what you currently have. 
Another trick i used was to partition the problem into multiple dataframes and 
run them sequentially and persistent the result and then run a union on the 
results. 



Hope this helps. 




On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:




Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.












Sent from my Verizon Wireless 4G LTE smartphone





 Original message 


From: "Sanders, Isaac B" <sande...@rose-hulman.edu>


Date: 01/21/2016 11:18 PM (GMT-05:00) 

To: Ted Yu <yuzhih...@gmail.com>


Cc: user@spark.apache.org


Subject: Re: 10hrs of Scheduler Delay 




I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.



- Isaac






On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have seen the following on github page:


Latest commit 50fdf0e  on Feb 22, 2015






That was 11 months ago.



Can you search for similar algorithm which runs on Spark and is newer ?



If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.



Cheers



On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.



This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.



I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.



https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/Distance

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Ted Yu
Opening a JIRA is fine. 

See if you can capture stack trace during the hung stage and attach to JIRA so 
that we have more clue. 

Thanks

> On Jan 25, 2016, at 4:25 AM, Darren Govoni <dar...@ontrenet.com> wrote:
> 
> Probably we should open a ticket for this.
> There's definitely a deadlock situation occurring in spark under certain 
> conditions.
> 
> The only clue I have is it always happens on the last stage. And it does seem 
> sensitive to scale. If my job has 300mb of data I'll see the deadlock. But if 
> I only run 10mb of it it will succeed. This suggest a serious fundamental 
> scaling problem.
> 
> Workers have plenty of resources.
> 
> 
> 
> Sent from my Verizon Wireless 4G LTE smartphone
> 
> 
>  Original message 
> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
> Date: 01/24/2016 2:54 PM (GMT-05:00) 
> To: Renu Yadav <yren...@gmail.com> 
> Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar 
> <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org 
> Subject: Re: 10hrs of Scheduler Delay 
> 
> I am not getting anywhere with any of the suggestions so far. :(
> 
> Trying some more outlets, I will share any solution I find.
> 
> - Isaac
> 
>> On Jan 23, 2016, at 1:48 AM, Renu Yadav <yren...@gmail.com> wrote:
>> 
>> If you turn on spark.speculation on then that might help. it worked  for me
>> 
>>> On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni <dar...@ontrenet.com> wrote:
>>> Thanks for the tip. I will try it. But this is the kind of thing spark is 
>>> supposed to figure out and handle. Or at least not get stuck forever.
>>> 
>>> 
>>> 
>>> Sent from my Verizon Wireless 4G LTE smartphone
>>> 
>>> 
>>>  Original message --------
>>> From: Muthu Jayakumar <bablo...@gmail.com> 
>>> Date: 01/22/2016 3:50 PM (GMT-05:00) 
>>> To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" 
>>> <sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com> 
>>> Cc: user@spark.apache.org 
>>> Subject: Re: 10hrs of Scheduler Delay 
>>> 
>>> Does increasing the number of partition helps? You could try out something 
>>> 3 times what you currently have. 
>>> Another trick i used was to partition the problem into multiple dataframes 
>>> and run them sequentially and persistent the result and then run a union on 
>>> the results. 
>>> 
>>> Hope this helps. 
>>> 
>>>> On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:
>>>> Me too. I had to shrink my dataset to get it to work. For us at least 
>>>> Spark seems to have scaling issues.
>>>> 
>>>> 
>>>> 
>>>> Sent from my Verizon Wireless 4G LTE smartphone
>>>> 
>>>> 
>>>>  Original message 
>>>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
>>>> Date: 01/21/2016 11:18 PM (GMT-05:00) 
>>>> To: Ted Yu <yuzhih...@gmail.com> 
>>>> Cc: user@spark.apache.org 
>>>> Subject: Re: 10hrs of Scheduler Delay 
>>>> 
>>>> I have run the driver on a smaller dataset (k=2, n=5000) and it worked 
>>>> quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m, 
>>>> but I am using more resources on this one.
>>>> 
>>>> - Isaac
>>>> 
>>>>> On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>> 
>>>>> You may have seen the following on github page:
>>>>> 
>>>>> Latest commit 50fdf0e  on Feb 22, 2015
>>>>> 
>>>>> That was 11 months ago.
>>>>> 
>>>>> Can you search for similar algorithm which runs on Spark and is newer ?
>>>>> 
>>>>> If nothing found, consider running the tests coming from the project to 
>>>>> determine whether the delay is intrinsic.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>>> On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
>>>>>> <sande...@rose-hulman.edu> wrote:
>>>>>> That thread seems to be moving, it oscillates between a few different 
>>>>>> traces… Maybe it is working. It seems odd that it would take that long.
>>>>>> 
>>>>>> This is 3rd party code, and after looking at some of it, I think it 
>>>>>> might not be as Spark-y as it could be.
>>>>>> 
>>>>>> I linked it below. I don’t know a lot about spark, so it might be fine, 
>>>>>> but I have my suspicions.
>>>>>> 
>>>>>> https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala
>>>>>> 
>>>>>> - Isaac
>>>>>> 
>>>>>>> On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>> 
>>>>>>> You may have noticed the following - did this indicate prolonged 
>>>>>>> computation in your code ?


Re: 10hrs of Scheduler Delay

2016-01-25 Thread Sanders, Isaac B
Is the thread dump the stack trace you are talking about? If so, I will see if 
I can capture the few different stages I have seen it in.

Thanks for the help, I was able to do it for 0.1% of my data. I will create the 
JIRA.

Thanks,
Isaac

On Jan 25, 2016, at 8:51 AM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

Opening a JIRA is fine.

See if you can capture stack trace during the hung stage and attach to JIRA so 
that we have more clue.

Thanks

On Jan 25, 2016, at 4:25 AM, Darren Govoni 
<dar...@ontrenet.com<mailto:dar...@ontrenet.com>> wrote:

Probably we should open a ticket for this.
There's definitely a deadlock situation occurring in spark under certain 
conditions.

The only clue I have is it always happens on the last stage. And it does seem 
sensitive to scale. If my job has 300mb of data I'll see the deadlock. But if I 
only run 10mb of it it will succeed. This suggest a serious fundamental scaling 
problem.

Workers have plenty of resources.



Sent from my Verizon Wireless 4G LTE smartphone


 Original message 
From: "Sanders, Isaac B" 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>
Date: 01/24/2016 2:54 PM (GMT-05:00)
To: Renu Yadav <yren...@gmail.com<mailto:yren...@gmail.com>>
Cc: Darren Govoni <dar...@ontrenet.com<mailto:dar...@ontrenet.com>>, Muthu 
Jayakumar <bablo...@gmail.com<mailto:bablo...@gmail.com>>, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>, 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay

I am not getting anywhere with any of the suggestions so far. :(

Trying some more outlets, I will share any solution I find.

- Isaac

On Jan 23, 2016, at 1:48 AM, Renu Yadav 
<yren...@gmail.com<mailto:yren...@gmail.com>> wrote:

If you turn on spark.speculation on then that might help. it worked  for me

On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni 
<dar...@ontrenet.com<mailto:dar...@ontrenet.com>> wrote:
Thanks for the tip. I will try it. But this is the kind of thing spark is 
supposed to figure out and handle. Or at least not get stuck forever.



Sent from my Verizon Wireless 4G LTE smartphone


 Original message 
From: Muthu Jayakumar <bablo...@gmail.com<mailto:bablo...@gmail.com>>
Date: 01/22/2016 3:50 PM (GMT-05:00)
To: Darren Govoni <dar...@ontrenet.com<mailto:dar...@ontrenet.com>>, "Sanders, 
Isaac B" <sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay

Does increasing the number of partition helps? You could try out something 3 
times what you currently have.
Another trick i used was to partition the problem into multiple dataframes and 
run them sequentially and persistent the result and then run a union on the 
results.

Hope this helps.

On Fri, Jan 22, 2016, 3:48 AM Darren Govoni 
<dar...@ontrenet.com<mailto:dar...@ontrenet.com>> wrote:
Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.



Sent from my Verizon Wireless 4G LTE smartphone


 Original message 
From: "Sanders, Isaac B" 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>
Date: 01/21/2016 11:18 PM (GMT-05:00)
To: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay

I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn't hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.

- Isaac

On Jan 21, 2016, at 11:06 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

You may have seen the following on github page:

Latest commit 50fdf0e  on Feb 22, 2015

That was 11 months ago.

Can you search for similar algorithm which runs on Spark and is newer ?

If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.

Cheers

On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>> wrote:
That thread seems to be moving, it oscillates between a few different traces... 
Maybe it is working. It seems odd that it would take that long.

This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.

I linked it below. I don't know a lot about spark, so it might be fine, but I 
have my suspicions.

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala

- Isaac

On Jan 21, 2016, at 10:08 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

You may have noticed the following - did this indicate prolonged computation in 
your code ?


Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni


Probably we should open a ticket for this.There's definitely a deadlock 
situation occurring in spark under certain conditions.
The only clue I have is it always happens on the last stage. And it does seem 
sensitive to scale. If my job has 300mb of data I'll see the deadlock. But if I 
only run 10mb of it it will succeed. This suggest a serious fundamental scaling 
problem.
Workers have plenty of resources.


Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
Date: 01/24/2016  2:54 PM  (GMT-05:00) 
To: Renu Yadav <yren...@gmail.com> 
Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar <bablo...@gmail.com>, 
Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 






I am not getting anywhere with any of the suggestions so far. :(



Trying some more outlets, I will share any solution I find.



- Isaac




On Jan 23, 2016, at 1:48 AM, Renu Yadav <yren...@gmail.com> wrote:



If you turn on spark.speculation on then that might help. it worked  for me




On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni 
<dar...@ontrenet.com> wrote:



Thanks for the tip. I will try it. But this is the kind of thing spark is 
supposed to figure out and handle. Or at least not get stuck forever.











Sent from my Verizon Wireless 4G LTE smartphone





 Original message 



From: Muthu Jayakumar <bablo...@gmail.com>


Date: 01/22/2016 3:50 PM (GMT-05:00) 

To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" 
<sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com>


Cc: user@spark.apache.org


Subject: Re: 10hrs of Scheduler Delay 



Does increasing the number of partition helps? You could try out something 3 
times what you currently have. 
Another trick i used was to partition the problem into multiple dataframes and 
run them sequentially and persistent the result and then run a union on the 
results. 



Hope this helps. 




On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:




Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.












Sent from my Verizon Wireless 4G LTE smartphone





 Original message 


From: "Sanders, Isaac B" <sande...@rose-hulman.edu>


Date: 01/21/2016 11:18 PM (GMT-05:00) 

To: Ted Yu <yuzhih...@gmail.com>


Cc: user@spark.apache.org


Subject: Re: 10hrs of Scheduler Delay 




I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.



- Isaac






On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have seen the following on github page:


Latest commit 50fdf0e  on Feb 22, 2015






That was 11 months ago.



Can you search for similar algorithm which runs on Spark and is newer ?



If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.



Cheers



On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.



This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.



I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.



https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala



- Isaac




On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have noticed the following - did this indicate prolonged computation in 
your code ?




Re: 10hrs of Scheduler Delay

2016-01-24 Thread Sanders, Isaac B
I am not getting anywhere with any of the suggestions so far. :(

Trying some more outlets, I will share any solution I find.

- Isaac

On Jan 23, 2016, at 1:48 AM, Renu Yadav 
<yren...@gmail.com<mailto:yren...@gmail.com>> wrote:

If you turn on spark.speculation on then that might help. it worked  for me

On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni 
<dar...@ontrenet.com<mailto:dar...@ontrenet.com>> wrote:
Thanks for the tip. I will try it. But this is the kind of thing spark is 
supposed to figure out and handle. Or at least not get stuck forever.



Sent from my Verizon Wireless 4G LTE smartphone


 Original message 
From: Muthu Jayakumar <bablo...@gmail.com<mailto:bablo...@gmail.com>>
Date: 01/22/2016 3:50 PM (GMT-05:00)
To: Darren Govoni <dar...@ontrenet.com<mailto:dar...@ontrenet.com>>, "Sanders, 
Isaac B" <sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay

Does increasing the number of partition helps? You could try out something 3 
times what you currently have.
Another trick i used was to partition the problem into multiple dataframes and 
run them sequentially and persistent the result and then run a union on the 
results.

Hope this helps.

On Fri, Jan 22, 2016, 3:48 AM Darren Govoni 
<dar...@ontrenet.com<mailto:dar...@ontrenet.com>> wrote:
Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.



Sent from my Verizon Wireless 4G LTE smartphone


 Original message 
From: "Sanders, Isaac B" 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>
Date: 01/21/2016 11:18 PM (GMT-05:00)
To: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay

I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.

- Isaac

On Jan 21, 2016, at 11:06 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

You may have seen the following on github page:

Latest commit 50fdf0e  on Feb 22, 2015

That was 11 months ago.

Can you search for similar algorithm which runs on Spark and is newer ?

If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.

Cheers

On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>> wrote:
That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.

This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.

I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala

- Isaac

On Jan 21, 2016, at 10:08 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

You may have noticed the following - did this indicate prolonged computation in 
your code ?

org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)

On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>> wrote:
Hadoop is: HDP 2.3.2.0-2950

Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b

Thanks

On Jan 21, 2016, at 7:44 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

Looks like you were running on YARN.

What hadoop version are you using ?

Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?

Thanks

On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>> wrote:
The Spark Version is 1.4.1

The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.

Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2

Thanks
Isaac

On Jan 21, 2016, at 11:03 AM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni


Thanks for the tip. I will try it. But this is the kind of thing spark is 
supposed to figure out and handle. Or at least not get stuck forever.


Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: Muthu Jayakumar <bablo...@gmail.com> 
Date: 01/22/2016  3:50 PM  (GMT-05:00) 
To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" 
<sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com> 
Cc: user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 

Does increasing the number of partition helps? You could try out something 3 
times what you currently have. Another trick i used was to partition the 
problem into multiple dataframes and run them sequentially and persistent the 
result and then run a union on the results. 
Hope this helps. 

On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:


Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.


Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
Date: 01/21/2016  11:18 PM  (GMT-05:00) 
To: Ted Yu <yuzhih...@gmail.com> 
Cc: user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 


I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.



- Isaac






On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have seen the following on github page:


Latest commit 50fdf0e  on Feb 22, 2015






That was 11 months ago.



Can you search for similar algorithm which runs on Spark and is newer ?



If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.



Cheers



On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.



This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.



I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.



https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala



- Isaac




On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have noticed the following - did this indicate prolonged computation in 
your code ?


org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)




On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



Hadoop is: HDP 2.3.2.0-2950



Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b



Thanks







On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote:



Looks like you were running on YARN.



What hadoop version are you using ?



Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?



Thanks



On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



The Spark Version is 1.4.1



The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.



Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2



Thanks
Isaac




On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote:



Can you provide a bit more information ?



command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?



Thanks 







On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:


Hey all,



I am a CS student in the United States working on my senior thesis.



My thesis uses Spark, and I am encountering some trouble.



I am using 
https://github.com/alitouka/spark_dbscan, and to determine parameters, I am 
using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.



I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.



I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.



It was stuck in Scheduler Delay for 10 hours overnight, and I have tried

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Muthu Jayakumar
Does increasing the number of partition helps? You could try out something
3 times what you currently have.
Another trick i used was to partition the problem into multiple dataframes
and run them sequentially and persistent the result and then run a union on
the results.

Hope this helps.

On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:

> Me too. I had to shrink my dataset to get it to work. For us at least
> Spark seems to have scaling issues.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
>  Original message 
> From: "Sanders, Isaac B" <sande...@rose-hulman.edu>
> Date: 01/21/2016 11:18 PM (GMT-05:00)
> To: Ted Yu <yuzhih...@gmail.com>
> Cc: user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> I have run the driver on a smaller dataset (k=2, n=5000) and it worked
> quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m,
> but I am using more resources on this one.
>
> - Isaac
>
> On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> You may have seen the following on github page:
>
> Latest commit 50fdf0e  on Feb 22, 2015
>
> That was 11 months ago.
>
> Can you search for similar algorithm which runs on Spark and is newer ?
>
> If nothing found, consider running the tests coming from the project to
> determine whether the delay is intrinsic.
>
> Cheers
>
> On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B <
> sande...@rose-hulman.edu> wrote:
>
>> That thread seems to be moving, it oscillates between a few different
>> traces… Maybe it is working. It seems odd that it would take that long.
>>
>> This is 3rd party code, and after looking at some of it, I think it might
>> not be as Spark-y as it could be.
>>
>> I linked it below. I don’t know a lot about spark, so it might be fine,
>> but I have my suspicions.
>>
>>
>> https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala
>>
>> - Isaac
>>
>> On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> You may have noticed the following - did this indicate prolonged
>> computation in your code ?
>>
>> org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
>> org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
>> org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)
>>
>>
>> On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B <
>> sande...@rose-hulman.edu> wrote:
>>
>>> Hadoop is: HDP 2.3.2.0-2950
>>>
>>> Here is a gist (pastebin) of my versions en masse and a stacktrace:
>>> https://gist.github.com/isaacsanders/2e59131758469097651b
>>>
>>> Thanks
>>>
>>> On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> Looks like you were running on YARN.
>>>
>>> What hadoop version are you using ?
>>>
>>> Can you capture a few stack traces of the AppMaster during the delay and
>>> pastebin them ?
>>>
>>> Thanks
>>>
>>> On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B <
>>> sande...@rose-hulman.edu> wrote:
>>>
>>>> The Spark Version is 1.4.1
>>>>
>>>> The logs are full of standard fair, nothing like an exception or even
>>>> interesting [INFO] lines.
>>>>
>>>> Here is the script I am using:
>>>> https://gist.github.com/isaacsanders/660f480810fbc07d4df2
>>>>
>>>> Thanks
>>>> Isaac
>>>>
>>>> On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>> Can you provide a bit more information ?
>>>>
>>>> command line for submitting Spark job
>>>> version of Spark
>>>> anything interesting from driver / executor logs ?
>>>>
>>>> Thanks
>>>>
>>>> On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <
>>>> sande...@rose-hulman.edu> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> I am a CS student in the United States working on my senior thesis.
>>>>>
>>>>> My thesis uses Spark, and I am encountering so

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni


Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.


Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
Date: 01/21/2016  11:18 PM  (GMT-05:00) 
To: Ted Yu <yuzhih...@gmail.com> 
Cc: user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 


I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.



- Isaac






On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have seen the following on github page:


Latest commit 50fdf0e  on Feb 22, 2015






That was 11 months ago.



Can you search for similar algorithm which runs on Spark and is newer ?



If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.



Cheers



On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.



This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.



I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.



https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala



- Isaac




On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:



You may have noticed the following - did this indicate prolonged computation in 
your code ?


org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)




On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



Hadoop is: HDP 2.3.2.0-2950



Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b



Thanks







On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote:



Looks like you were running on YARN.



What hadoop version are you using ?



Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?



Thanks



On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:



The Spark Version is 1.4.1



The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.



Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2



Thanks
Isaac




On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote:



Can you provide a bit more information ?



command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?



Thanks 







On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:


Hey all,



I am a CS student in the United States working on my senior thesis.



My thesis uses Spark, and I am encountering some trouble.



I am using 
https://github.com/alitouka/spark_dbscan, and to determine parameters, I am 
using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.



I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.



I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.



It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.



I have tried:

- Increasing heap sizes and numbers of cores

- More/less executors with different amounts of resources.

- Kyro Serialization

- FAIR Scheduling



It doesn’t seem like it should require this much. Any ideas?



- Isaac





















































Re: 10hrs of Scheduler Delay

2016-01-22 Thread Muthu Jayakumar
If you turn on config (like "-XX:+PrintGCDetails -XX:+PrintGCTimeStamps")
you would be able to see why some job run for a long time.
The tuning guide (http://spark.apache.org/docs/latest/tuning.html) provides
some insight on this. Setting up explicit partition helped in my case when
I was using RDD.

Hope this helps.

On Fri, Jan 22, 2016 at 1:51 PM, Darren Govoni <dar...@ontrenet.com> wrote:

> Thanks for the tip. I will try it. But this is the kind of thing spark is
> supposed to figure out and handle. Or at least not get stuck forever.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
>  Original message 
> From: Muthu Jayakumar <bablo...@gmail.com>
> Date: 01/22/2016 3:50 PM (GMT-05:00)
> To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" <
> sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com>
> Cc: user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> Does increasing the number of partition helps? You could try out something
> 3 times what you currently have.
> Another trick i used was to partition the problem into multiple dataframes
> and run them sequentially and persistent the result and then run a union on
> the results.
>
> Hope this helps.
>
> On Fri, Jan 22, 2016, 3:48 AM Darren Govoni <dar...@ontrenet.com> wrote:
>
>> Me too. I had to shrink my dataset to get it to work. For us at least
>> Spark seems to have scaling issues.
>>
>>
>>
>> Sent from my Verizon Wireless 4G LTE smartphone
>>
>>
>>  Original message ----
>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu>
>> Date: 01/21/2016 11:18 PM (GMT-05:00)
>> To: Ted Yu <yuzhih...@gmail.com>
>> Cc: user@spark.apache.org
>> Subject: Re: 10hrs of Scheduler Delay
>>
>> I have run the driver on a smaller dataset (k=2, n=5000) and it worked
>> quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m,
>> but I am using more resources on this one.
>>
>> - Isaac
>>
>> On Jan 21, 2016, at 11:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> You may have seen the following on github page:
>>
>> Latest commit 50fdf0e  on Feb 22, 2015
>>
>> That was 11 months ago.
>>
>> Can you search for similar algorithm which runs on Spark and is newer ?
>>
>> If nothing found, consider running the tests coming from the project to
>> determine whether the delay is intrinsic.
>>
>> Cheers
>>
>> On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B <
>> sande...@rose-hulman.edu> wrote:
>>
>>> That thread seems to be moving, it oscillates between a few different
>>> traces… Maybe it is working. It seems odd that it would take that long.
>>>
>>> This is 3rd party code, and after looking at some of it, I think it
>>> might not be as Spark-y as it could be.
>>>
>>> I linked it below. I don’t know a lot about spark, so it might be fine,
>>> but I have my suspicions.
>>>
>>>
>>> https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala
>>>
>>> - Isaac
>>>
>>> On Jan 21, 2016, at 10:08 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> You may have noticed the following - did this indicate prolonged
>>> computation in your code ?
>>>
>>> org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
>>> org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
>>> org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
>>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)
>>>
>>>
>>> On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B <
>>> sande...@rose-hulman.edu> wrote:
>>>
>>>> Hadoop is: HDP 2.3.2.0-2950
>>>>
>>>> Here is a gist (pastebin) of my versions en masse and a stacktrace:
>>>> https://gist.github.com/isaacsanders/2e59131758469097651b
>>>>
>>>> Thanks
>>>>
>>>> On Jan 21, 2016, at 7:44 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>> Looks like you were running on YARN.
>>>>
>>>> What hadoop version are you using ?
>>>>
>>>> Can you capture a few stack traces of the AppMaster during the d

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
Can you provide a bit more information ?

command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?

Thanks

On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
wrote:

> Hey all,
>
> I am a CS student in the United States working on my senior thesis.
>
> My thesis uses Spark, and I am encountering some trouble.
>
> I am using https://github.com/alitouka/spark_dbscan, and to determine
> parameters, I am using the utility class they supply,
> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.
>
> I am on a 10 node cluster with one machine with 8 cores and 32G of memory
> and nine machines with 6 cores and 16G of memory.
>
> I have 442M of data, which seems like it would be a joke, but the job
> stalls at the last stage.
>
> It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a
> number of things for the last couple days, but nothing seems to be helping.
>
> I have tried:
> - Increasing heap sizes and numbers of cores
> - More/less executors with different amounts of resources.
> - Kyro Serialization
> - FAIR Scheduling
>
> It doesn’t seem like it should require this much. Any ideas?
>
> - Isaac


Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
The Spark Version is 1.4.1

The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.

Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2

Thanks
Isaac

On Jan 21, 2016, at 11:03 AM, Ted Yu 
> wrote:

Can you provide a bit more information ?

command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?

Thanks

On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
> wrote:
Hey all,

I am a CS student in the United States working on my senior thesis.

My thesis uses Spark, and I am encountering some trouble.

I am using https://github.com/alitouka/spark_dbscan, and to determine 
parameters, I am using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.

I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.

I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.

It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.

I have tried:
- Increasing heap sizes and numbers of cores
- More/less executors with different amounts of resources.
- Kyro Serialization
- FAIR Scheduling

It doesn’t seem like it should require this much. Any ideas?

- Isaac




Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
Looks like you were running on YARN.

What hadoop version are you using ?

Can you capture a few stack traces of the AppMaster during the delay and
pastebin them ?

Thanks

On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
wrote:

> The Spark Version is 1.4.1
>
> The logs are full of standard fair, nothing like an exception or even
> interesting [INFO] lines.
>
> Here is the script I am using:
> https://gist.github.com/isaacsanders/660f480810fbc07d4df2
>
> Thanks
> Isaac
>
> On Jan 21, 2016, at 11:03 AM, Ted Yu  wrote:
>
> Can you provide a bit more information ?
>
> command line for submitting Spark job
> version of Spark
> anything interesting from driver / executor logs ?
>
> Thanks
>
> On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <
> sande...@rose-hulman.edu> wrote:
>
>> Hey all,
>>
>> I am a CS student in the United States working on my senior thesis.
>>
>> My thesis uses Spark, and I am encountering some trouble.
>>
>> I am using https://github.com/alitouka/spark_dbscan, and to determine
>> parameters, I am using the utility class they supply,
>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.
>>
>> I am on a 10 node cluster with one machine with 8 cores and 32G of memory
>> and nine machines with 6 cores and 16G of memory.
>>
>> I have 442M of data, which seems like it would be a joke, but the job
>> stalls at the last stage.
>>
>> It was stuck in Scheduler Delay for 10 hours overnight, and I have tried
>> a number of things for the last couple days, but nothing seems to be
>> helping.
>>
>> I have tried:
>> - Increasing heap sizes and numbers of cores
>> - More/less executors with different amounts of resources.
>> - Kyro Serialization
>> - FAIR Scheduling
>>
>> It doesn’t seem like it should require this much. Any ideas?
>>
>> - Isaac
>
>
>
>


Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
You may have noticed the following - did this indicate prolonged
computation in your code ?

org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)


On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
wrote:

> Hadoop is: HDP 2.3.2.0-2950
>
> Here is a gist (pastebin) of my versions en masse and a stacktrace:
> https://gist.github.com/isaacsanders/2e59131758469097651b
>
> Thanks
>
> On Jan 21, 2016, at 7:44 PM, Ted Yu  wrote:
>
> Looks like you were running on YARN.
>
> What hadoop version are you using ?
>
> Can you capture a few stack traces of the AppMaster during the delay and
> pastebin them ?
>
> Thanks
>
> On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B <
> sande...@rose-hulman.edu> wrote:
>
>> The Spark Version is 1.4.1
>>
>> The logs are full of standard fair, nothing like an exception or even
>> interesting [INFO] lines.
>>
>> Here is the script I am using:
>> https://gist.github.com/isaacsanders/660f480810fbc07d4df2
>>
>> Thanks
>> Isaac
>>
>> On Jan 21, 2016, at 11:03 AM, Ted Yu  wrote:
>>
>> Can you provide a bit more information ?
>>
>> command line for submitting Spark job
>> version of Spark
>> anything interesting from driver / executor logs ?
>>
>> Thanks
>>
>> On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <
>> sande...@rose-hulman.edu> wrote:
>>
>>> Hey all,
>>>
>>> I am a CS student in the United States working on my senior thesis.
>>>
>>> My thesis uses Spark, and I am encountering some trouble.
>>>
>>> I am using https://github.com/alitouka/spark_dbscan, and to determine
>>> parameters, I am using the utility class they supply,
>>> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.
>>>
>>> I am on a 10 node cluster with one machine with 8 cores and 32G of
>>> memory and nine machines with 6 cores and 16G of memory.
>>>
>>> I have 442M of data, which seems like it would be a joke, but the job
>>> stalls at the last stage.
>>>
>>> It was stuck in Scheduler Delay for 10 hours overnight, and I have tried
>>> a number of things for the last couple days, but nothing seems to be
>>> helping.
>>>
>>> I have tried:
>>> - Increasing heap sizes and numbers of cores
>>> - More/less executors with different amounts of resources.
>>> - Kyro Serialization
>>> - FAIR Scheduling
>>>
>>> It doesn’t seem like it should require this much. Any ideas?
>>>
>>> - Isaac
>>
>>
>>
>>
>
>


Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.

This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.

I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala

- Isaac

On Jan 21, 2016, at 10:08 PM, Ted Yu 
> wrote:

You may have noticed the following - did this indicate prolonged computation in 
your code ?

org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)

On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
> wrote:
Hadoop is: HDP 2.3.2.0-2950

Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b

Thanks

On Jan 21, 2016, at 7:44 PM, Ted Yu 
> wrote:

Looks like you were running on YARN.

What hadoop version are you using ?

Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?

Thanks

On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
> wrote:
The Spark Version is 1.4.1

The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.

Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2

Thanks
Isaac

On Jan 21, 2016, at 11:03 AM, Ted Yu 
> wrote:

Can you provide a bit more information ?

command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?

Thanks

On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
> wrote:
Hey all,

I am a CS student in the United States working on my senior thesis.

My thesis uses Spark, and I am encountering some trouble.

I am using https://github.com/alitouka/spark_dbscan, and to determine 
parameters, I am using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.

I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.

I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.

It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.

I have tried:
- Increasing heap sizes and numbers of cores
- More/less executors with different amounts of resources.
- Kyro Serialization
- FAIR Scheduling

It doesn’t seem like it should require this much. Any ideas?

- Isaac








Re: 10hrs of Scheduler Delay

2016-01-21 Thread Darren Govoni


I've experienced this same problem. Always the last stage hangs. Indeterminant. 
No errors in logs. I run spark 1.5.2. Can't find an explanation. But it's 
definitely a showstopper.


Sent from my Verizon Wireless 4G LTE smartphone

 Original message 
From: Ted Yu <yuzhih...@gmail.com> 
Date: 01/21/2016  7:44 PM  (GMT-05:00) 
To: "Sanders, Isaac B" <sande...@rose-hulman.edu> 
Cc: user@spark.apache.org 
Subject: Re: 10hrs of Scheduler Delay 

Looks like you were running on YARN.
What hadoop version are you using ?
Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?
Thanks
On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B <sande...@rose-hulman.edu> 
wrote:





The Spark Version is 1.4.1



The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.



Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2



Thanks
Isaac




On Jan 21, 2016, at 11:03 AM, Ted Yu <yuzhih...@gmail.com> wrote:



Can you provide a bit more information ?



command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?



Thanks 





On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
<sande...@rose-hulman.edu> wrote:


Hey all,



I am a CS student in the United States working on my senior thesis.



My thesis uses Spark, and I am encountering some trouble.



I am using 
https://github.com/alitouka/spark_dbscan, and to determine parameters, I am 
using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.



I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.



I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.



It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.



I have tried:

- Increasing heap sizes and numbers of cores

- More/less executors with different amounts of resources.

- Kyro Serialization

- FAIR Scheduling



It doesn’t seem like it should require this much. Any ideas?



- Isaac















Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
Hadoop is: HDP 2.3.2.0-2950

Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b

Thanks

On Jan 21, 2016, at 7:44 PM, Ted Yu 
> wrote:

Looks like you were running on YARN.

What hadoop version are you using ?

Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?

Thanks

On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
> wrote:
The Spark Version is 1.4.1

The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.

Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2

Thanks
Isaac

On Jan 21, 2016, at 11:03 AM, Ted Yu 
> wrote:

Can you provide a bit more information ?

command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?

Thanks

On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
> wrote:
Hey all,

I am a CS student in the United States working on my senior thesis.

My thesis uses Spark, and I am encountering some trouble.

I am using https://github.com/alitouka/spark_dbscan, and to determine 
parameters, I am using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.

I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.

I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.

It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.

I have tried:
- Increasing heap sizes and numbers of cores
- More/less executors with different amounts of resources.
- Kyro Serialization
- FAIR Scheduling

It doesn’t seem like it should require this much. Any ideas?

- Isaac






Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
You may have seen the following on github page:

Latest commit 50fdf0e  on Feb 22, 2015

That was 11 months ago.

Can you search for similar algorithm which runs on Spark and is newer ?

If nothing found, consider running the tests coming from the project to
determine whether the delay is intrinsic.

Cheers

On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
wrote:

> That thread seems to be moving, it oscillates between a few different
> traces… Maybe it is working. It seems odd that it would take that long.
>
> This is 3rd party code, and after looking at some of it, I think it might
> not be as Spark-y as it could be.
>
> I linked it below. I don’t know a lot about spark, so it might be fine,
> but I have my suspicions.
>
>
> https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala
>
> - Isaac
>
> On Jan 21, 2016, at 10:08 PM, Ted Yu  wrote:
>
> You may have noticed the following - did this indicate prolonged
> computation in your code ?
>
> org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
> org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
> org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
> org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)
>
>
> On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B <
> sande...@rose-hulman.edu> wrote:
>
>> Hadoop is: HDP 2.3.2.0-2950
>>
>> Here is a gist (pastebin) of my versions en masse and a stacktrace:
>> https://gist.github.com/isaacsanders/2e59131758469097651b
>>
>> Thanks
>>
>> On Jan 21, 2016, at 7:44 PM, Ted Yu  wrote:
>>
>> Looks like you were running on YARN.
>>
>> What hadoop version are you using ?
>>
>> Can you capture a few stack traces of the AppMaster during the delay and
>> pastebin them ?
>>
>> Thanks
>>
>> On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B <
>> sande...@rose-hulman.edu> wrote:
>>
>>> The Spark Version is 1.4.1
>>>
>>> The logs are full of standard fair, nothing like an exception or even
>>> interesting [INFO] lines.
>>>
>>> Here is the script I am using:
>>> https://gist.github.com/isaacsanders/660f480810fbc07d4df2
>>>
>>> Thanks
>>> Isaac
>>>
>>> On Jan 21, 2016, at 11:03 AM, Ted Yu  wrote:
>>>
>>> Can you provide a bit more information ?
>>>
>>> command line for submitting Spark job
>>> version of Spark
>>> anything interesting from driver / executor logs ?
>>>
>>> Thanks
>>>
>>> On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B <
>>> sande...@rose-hulman.edu> wrote:
>>>
 Hey all,

 I am a CS student in the United States working on my senior thesis.

 My thesis uses Spark, and I am encountering some trouble.

 I am using https://github.com/alitouka/spark_dbscan, and to determine
 parameters, I am using the utility class they supply,
 org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.

 I am on a 10 node cluster with one machine with 8 cores and 32G of
 memory and nine machines with 6 cores and 16G of memory.

 I have 442M of data, which seems like it would be a joke, but the job
 stalls at the last stage.

 It was stuck in Scheduler Delay for 10 hours overnight, and I have
 tried a number of things for the last couple days, but nothing seems to be
 helping.

 I have tried:
 - Increasing heap sizes and numbers of cores
 - More/less executors with different amounts of resources.
 - Kyro Serialization
 - FAIR Scheduling

 It doesn’t seem like it should require this much. Any ideas?

 - Isaac
>>>
>>>
>>>
>>>
>>
>>
>
>


Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.

- Isaac

On Jan 21, 2016, at 11:06 PM, Ted Yu 
> wrote:

You may have seen the following on github page:

Latest commit 50fdf0e  on Feb 22, 2015

That was 11 months ago.

Can you search for similar algorithm which runs on Spark and is newer ?

If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.

Cheers

On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
> wrote:
That thread seems to be moving, it oscillates between a few different traces… 
Maybe it is working. It seems odd that it would take that long.

This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.

I linked it below. I don’t know a lot about spark, so it might be fine, but I 
have my suspicions.

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala

- Isaac

On Jan 21, 2016, at 10:08 PM, Ted Yu 
> wrote:

You may have noticed the following - did this indicate prolonged computation in 
your code ?

org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCalculation$class.calculateDistance(DistanceCalculation.scala:15)
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver$.calculateDistance(DistanceToNearestNeighborDriver.scala:16)

On Thu, Jan 21, 2016 at 5:13 PM, Sanders, Isaac B 
> wrote:
Hadoop is: HDP 2.3.2.0-2950

Here is a gist (pastebin) of my versions en masse and a stacktrace: 
https://gist.github.com/isaacsanders/2e59131758469097651b

Thanks

On Jan 21, 2016, at 7:44 PM, Ted Yu 
> wrote:

Looks like you were running on YARN.

What hadoop version are you using ?

Can you capture a few stack traces of the AppMaster during the delay and 
pastebin them ?

Thanks

On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B 
> wrote:
The Spark Version is 1.4.1

The logs are full of standard fair, nothing like an exception or even 
interesting [INFO] lines.

Here is the script I am using: 
https://gist.github.com/isaacsanders/660f480810fbc07d4df2

Thanks
Isaac

On Jan 21, 2016, at 11:03 AM, Ted Yu 
> wrote:

Can you provide a bit more information ?

command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?

Thanks

On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B 
> wrote:
Hey all,

I am a CS student in the United States working on my senior thesis.

My thesis uses Spark, and I am encountering some trouble.

I am using https://github.com/alitouka/spark_dbscan, and to determine 
parameters, I am using the utility class they supply, 
org.alitouka.spark.dbscan.exploratoryAnalysis.DistanceToNearestNeighborDriver.

I am on a 10 node cluster with one machine with 8 cores and 32G of memory and 
nine machines with 6 cores and 16G of memory.

I have 442M of data, which seems like it would be a joke, but the job stalls at 
the last stage.

It was stuck in Scheduler Delay for 10 hours overnight, and I have tried a 
number of things for the last couple days, but nothing seems to be helping.

I have tried:
- Increasing heap sizes and numbers of cores
- More/less executors with different amounts of resources.
- Kyro Serialization
- FAIR Scheduling

It doesn’t seem like it should require this much. Any ideas?

- Isaac