>
>>>
>>> Original message
>>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu>
>>> Date: 01/24/2016 2:54 PM (GMT-05:00)
>>> To: Renu Yadav <yren...@gmail.com>
>>> Cc: Darren Govoni <dar...@on
: "Sanders, Isaac B" <sande...@rose-hulman.edu>
Date: 01/25/2016 8:59 AM (GMT-05:00)
To: Ted Yu <yuzhih...@gmail.com>
Cc: Darren Govoni <dar...@ontrenet.com>, Renu Yadav <yren...@gmail.com>, Muthu
Jayakumar <bablo...@gmail.com>, user@spark.apache.or
> Date: 01/24/2016 2:54 PM (GMT-05:00)
> To: Renu Yadav <yren...@gmail.com>
> Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar
> <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> I
;yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>,
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay
I am not getting anywhere with any of the suggestions so far. :(
Trying some more outlets, I will share any solution I find.
- Isaac
On Jan
4 PM (GMT-05:00)
To: Renu Yadav <yren...@gmail.com>
Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar <bablo...@gmail.com>,
Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
I am not getting anywhere with any of the su
aac B" <sande...@rose-hulman.edu<mailto:sande...@rose-hulman.edu>>, Ted Yu
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay
Does increasing the number of partition h
2/2016 3:50 PM (GMT-05:00)
To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B"
<sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
Does increasing the number of partition helps? You cou
---
> From: "Sanders, Isaac B" <sande...@rose-hulman.edu>
> Date: 01/21/2016 11:18 PM (GMT-05:00)
> To: Ted Yu <yuzhih...@gmail.com>
> Cc: user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> I have run the driver on a smaller dataset (
8 PM (GMT-05:00)
To: Ted Yu <yuzhih...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am
using more resources o
or us at least
>> Spark seems to have scaling issues.
>>
>>
>>
>> Sent from my Verizon Wireless 4G LTE smartphone
>>
>>
>> Original message ----
>> From: "Sanders, Isaac B" <sande...@rose-hulman.edu>
>> Date
Hey all,
I am a CS student in the United States working on my senior thesis.
My thesis uses Spark, and I am encountering some trouble.
I am using https://github.com/alitouka/spark_dbscan, and to determine
parameters, I am using the utility class they supply,
Can you provide a bit more information ?
command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?
Thanks
On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B
wrote:
> Hey all,
>
> I am a CS student in the United States
The Spark Version is 1.4.1
The logs are full of standard fair, nothing like an exception or even
interesting [INFO] lines.
Here is the script I am using:
https://gist.github.com/isaacsanders/660f480810fbc07d4df2
Thanks
Isaac
On Jan 21, 2016, at 11:03 AM, Ted Yu
Looks like you were running on YARN.
What hadoop version are you using ?
Can you capture a few stack traces of the AppMaster during the delay and
pastebin them ?
Thanks
On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B
wrote:
> The Spark Version is 1.4.1
>
> The
You may have noticed the following - did this indicate prolonged
computation in your code ?
org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
That thread seems to be moving, it oscillates between a few different traces…
Maybe it is working. It seems odd that it would take that long.
This is 3rd party code, and after looking at some of it, I think it might not
be as Spark-y as it could be.
I linked it below. I don’t know a lot about
hih...@gmail.com>
Date: 01/21/2016 7:44 PM (GMT-05:00)
To: "Sanders, Isaac B" <sande...@rose-hulman.edu>
Cc: user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
Looks like you were running on YARN.
What hadoop version are you using ?
Can you capture a few stack t
Hadoop is: HDP 2.3.2.0-2950
Here is a gist (pastebin) of my versions en masse and a stacktrace:
https://gist.github.com/isaacsanders/2e59131758469097651b
Thanks
On Jan 21, 2016, at 7:44 PM, Ted Yu
> wrote:
Looks like you were running on YARN.
You may have seen the following on github page:
Latest commit 50fdf0e on Feb 22, 2015
That was 11 months ago.
Can you search for similar algorithm which runs on Spark and is newer ?
If nothing found, consider running the tests coming from the project to
determine whether the delay is
I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am
using more resources on this one.
- Isaac
On Jan 21, 2016, at 11:06 PM, Ted Yu
> wrote:
20 matches
Mail list logo