Re: 10hrs of Scheduler Delay

2016-01-25 Thread Ted Yu
> >>> >>> Original message >>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> >>> Date: 01/24/2016 2:54 PM (GMT-05:00) >>> To: Renu Yadav <yren...@gmail.com> >>> Cc: Darren Govoni <dar...@on

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni
: "Sanders, Isaac B" <sande...@rose-hulman.edu> Date: 01/25/2016 8:59 AM (GMT-05:00) To: Ted Yu <yuzhih...@gmail.com> Cc: Darren Govoni <dar...@ontrenet.com>, Renu Yadav <yren...@gmail.com>, Muthu Jayakumar <bablo...@gmail.com>, user@spark.apache.or

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Ted Yu
> Date: 01/24/2016 2:54 PM (GMT-05:00) > To: Renu Yadav <yren...@gmail.com> > Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar > <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org > Subject: Re: 10hrs of Scheduler Delay > > I

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Sanders, Isaac B
;yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>, user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: 10hrs of Scheduler Delay I am not getting anywhere with any of the suggestions so far. :( Trying some more outlets, I will share any solution I find. - Isaac On Jan

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni
4 PM (GMT-05:00) To: Renu Yadav <yren...@gmail.com> Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay I am not getting anywhere with any of the su

Re: 10hrs of Scheduler Delay

2016-01-24 Thread Sanders, Isaac B
aac B" <sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: 10hrs of Scheduler Delay Does increasing the number of partition h

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni
2/2016 3:50 PM (GMT-05:00) To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" <sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com> Cc: user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay Does increasing the number of partition helps? You cou

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Muthu Jayakumar
--- > From: "Sanders, Isaac B" <sande...@rose-hulman.edu> > Date: 01/21/2016 11:18 PM (GMT-05:00) > To: Ted Yu <yuzhih...@gmail.com> > Cc: user@spark.apache.org > Subject: Re: 10hrs of Scheduler Delay > > I have run the driver on a smaller dataset (

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni
8 PM (GMT-05:00) To: Ted Yu <yuzhih...@gmail.com> Cc: user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am using more resources o

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Muthu Jayakumar
or us at least >> Spark seems to have scaling issues. >> >> >> >> Sent from my Verizon Wireless 4G LTE smartphone >> >> >> Original message ---- >> From: "Sanders, Isaac B" <sande...@rose-hulman.edu> >> Date

10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
Hey all, I am a CS student in the United States working on my senior thesis. My thesis uses Spark, and I am encountering some trouble. I am using https://github.com/alitouka/spark_dbscan, and to determine parameters, I am using the utility class they supply,

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
Can you provide a bit more information ? command line for submitting Spark job version of Spark anything interesting from driver / executor logs ? Thanks On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B wrote: > Hey all, > > I am a CS student in the United States

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
The Spark Version is 1.4.1 The logs are full of standard fair, nothing like an exception or even interesting [INFO] lines. Here is the script I am using: https://gist.github.com/isaacsanders/660f480810fbc07d4df2 Thanks Isaac On Jan 21, 2016, at 11:03 AM, Ted Yu

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
Looks like you were running on YARN. What hadoop version are you using ? Can you capture a few stack traces of the AppMaster during the delay and pastebin them ? Thanks On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B wrote: > The Spark Version is 1.4.1 > > The

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
You may have noticed the following - did this indicate prolonged computation in your code ? org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205) org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
That thread seems to be moving, it oscillates between a few different traces… Maybe it is working. It seems odd that it would take that long. This is 3rd party code, and after looking at some of it, I think it might not be as Spark-y as it could be. I linked it below. I don’t know a lot about

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Darren Govoni
hih...@gmail.com> Date: 01/21/2016 7:44 PM (GMT-05:00) To: "Sanders, Isaac B" <sande...@rose-hulman.edu> Cc: user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay Looks like you were running on YARN. What hadoop version are you using ? Can you capture a few stack t

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
Hadoop is: HDP 2.3.2.0-2950 Here is a gist (pastebin) of my versions en masse and a stacktrace: https://gist.github.com/isaacsanders/2e59131758469097651b Thanks On Jan 21, 2016, at 7:44 PM, Ted Yu > wrote: Looks like you were running on YARN.

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Ted Yu
You may have seen the following on github page: Latest commit 50fdf0e on Feb 22, 2015 That was 11 months ago. Can you search for similar algorithm which runs on Spark and is newer ? If nothing found, consider running the tests coming from the project to determine whether the delay is

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Sanders, Isaac B
I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am using more resources on this one. - Isaac On Jan 21, 2016, at 11:06 PM, Ted Yu > wrote: