SparkEventListener dropping events

2017-08-03 Thread Miles Crawford
We are seeing lots of stability problems with Spark 2.1.1 as a result of
dropped events.  We disabled the event log, which seemed to help, but many
events are still being dropped, as in the example log below.

I there any way for me to see what listener is backing up the queue? Is
there any workaround for this issue?


2017-08-03 04:13:29,852 ERROR org.apache.spark.scheduler.LiveListenerBus:
Dropping SparkListenerEvent because no remaining room in event queue. This
likely means one of the SparkListeners is too slow and cannot keep up with
the rate at which tasks are being started by the scheduler.
2017-08-03 04:13:29,853 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
2017-08-03 04:14:29,854 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 32738 SparkListenerEvents since Thu Aug 03 04:13:29 UTC 2017
2017-08-03 04:15:15,095 INFO
 org.allenai.s2.pipeline.spark.steps.LoadDaqPapers$: Finished in 127.572
seconds.
2017-08-03 04:15:15,095 INFO  org.allenai.s2.common.metrics.Metrics$:
Adding additional tags to all metrics and events: [pipeline, env:prod]
2017-08-03 04:15:15,149 INFO
 org.allenai.s2.pipeline.spark.steps.MergeSourcedPapers$: Computing
2017-08-03 04:15:29,853 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 28816 SparkListenerEvents since Thu Aug 03 04:14:29 UTC 2017
2017-08-03 04:16:29,868 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 18613 SparkListenerEvents since Thu Aug 03 04:15:29 UTC 2017
2017-08-03 04:17:29,868 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 52231 SparkListenerEvents since Thu Aug 03 04:16:29 UTC 2017
2017-08-03 04:18:29,868 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 16646 SparkListenerEvents since Thu Aug 03 04:17:29 UTC 2017
2017-08-03 04:19:29,868 WARN  org.apache.spark.scheduler.LiveListenerBus:
Dropped 19693 SparkListenerEvents since Thu Aug 03 04:18:29 UTC 2017


Re: Bizarre UI Behavior after migration

2017-05-22 Thread Miles Crawford
Well, what's happening here is that jobs become "un-finished" - they
complete, and then later on pop back into the "Active" section showing a
small number of complete/inprogress tasks.

In my screenshot, Job #1 completed as normal, and then later on switched
back to active with only 92 tasks... it never seems to change again, it's
stuck in this frozen, active state.


On Mon, May 22, 2017 at 12:50 PM, Vadim Semenov <vadim.seme...@datadoghq.com
> wrote:

> I believe it shows only the tasks that have actually being executed, if
> there were tasks with no data, they don't get reported.
>
> I might be mistaken, if somebody has a good explanation, would also like
> to hear.
>
> On Fri, May 19, 2017 at 5:45 PM, Miles Crawford <mil...@allenai.org>
> wrote:
>
>> Hey ya'll,
>>
>> Trying to migrate from Spark 1.6.1 to 2.1.0.
>>
>> I use EMR, and launched a new cluster using EMR 5.5, which runs spark
>> 2.1.0.
>>
>> I updated my dependencies, and fixed a few API changes related to
>> accumulators, and presto! my application was running on the new cluster.
>>
>> But the application UI shows crazy output:
>> https://www.dropbox.com/s/egtj1056qeudswj/sparkwut.png?dl=0
>>
>> The applications seem to complete successfully, but I was wondering if
>> anyone has an idea of what might be going wrong?
>>
>> Thanks,
>> -Miles
>>
>
>


Re: Spark UI shows Jobs are processing, but the files are already written to S3

2017-05-19 Thread Miles Crawford
Could I be experiencing the same thing?

https://www.dropbox.com/s/egtj1056qeudswj/sparkwut.png?dl=0

On Wed, Nov 16, 2016 at 10:37 AM, Shreya Agarwal 
wrote:

> I think that is a bug. I have seen that a lot especially with long running
> jobs where Spark skips a lot of stages because it has pre-computed results.
> And some of these are never marked as completed, even though in reality
> they are. I figured this out because I was using the interactive shell
> (spark-shell) and the shell came up to a prompt indicating the job had
> finished even though there were a lot of Active jobs and tasks according to
> the UI. And my output is correct.
>
>
>
> Is there a JIRA item tracking this?
>
>
>
> *From:* Kuchekar [mailto:kuchekar.nil...@gmail.com]
> *Sent:* Wednesday, November 16, 2016 10:00 AM
> *To:* spark users 
> *Subject:* Spark UI shows Jobs are processing, but the files are already
> written to S3
>
>
>
> Hi,
>
>
>
>  I am running a spark job, which saves the computed data (massive
> data) to S3. On  the Spark Ui I see the some jobs are active, but no
> activity in the logs. Also on S3 all the data has be written (verified each
> bucket --> it has _SUCCESS file)
>
>
>
> Am I missing something?
>
>
>
> Thanks.
>
> Kuchekar, Nilesh
>


Bizarre UI Behavior after migration

2017-05-19 Thread Miles Crawford
Hey ya'll,

Trying to migrate from Spark 1.6.1 to 2.1.0.

I use EMR, and launched a new cluster using EMR 5.5, which runs spark 2.1.0.

I updated my dependencies, and fixed a few API changes related to
accumulators, and presto! my application was running on the new cluster.

But the application UI shows crazy output:
https://www.dropbox.com/s/egtj1056qeudswj/sparkwut.png?dl=0

The applications seem to complete successfully, but I was wondering if
anyone has an idea of what might be going wrong?

Thanks,
-Miles


Bizarre behavior using Datasets/ML on Spark 2.0

2016-09-21 Thread Miles Crawford
Hello folks. I recently migrated my application to Spark 2.0, and
everything worked well, except for one function that uses "toDS" and the ML
libraries.

This stage used to complete in 15 minutes or so on 1.6.2, and now takes
almost two hours.

The UI shows very strange behavior - completed stages still being worked
on, concurrent work on tons of stages, including ones from downstream jobs:
https://dl.dropboxusercontent.com/u/231152/spark.png

Anyone know what might be going on? The only source change I made was
changing "toDF" to "toDS()" before handing my RDDs to the ML libraries.

Thanks,
-miles


Re: History Server Refresh?

2016-04-12 Thread Miles Crawford
It is completed apps that are not showing up. I'm fine with incomplete apps
not appearing.

On Tue, Apr 12, 2016 at 6:43 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 12 Apr 2016, at 00:21, Miles Crawford <mil...@allenai.org> wrote:
>
> Hey there. I have my spark applications set up to write their event logs
> into S3 - this is super useful for ephemeral clusters, I can have
> persistent history even though my hosts go away.
>
> A history server is set up to view this s3 location, and that works fine
> too - at least on startup.
>
> The problem is that the history server doesn't seem to notice new logs
> arriving into the S3 bucket.  Any idea how I can get it to scan the folder
> for new files?
>
> Thanks,
> -miles
>
>
> s3 isn't a real filesystem, and apps writing to it don't have any data
> written until one of
>  -the output stream is close()'d. This happens at the end of the app
>  -the file is set up to be partitioned and a partition size is crossed
>
> Until either of those conditions are met, the history server isn't going
> to see anything.
>
> If you are going to use s3 as the dest, and you want to see incomplete
> apps, then you'll need to configure the spark job to have smaller partition
> size (64? 128? MB).
>
> If it's completed apps that aren't being seen by the HS, then that's a
> bug, though if its against s3 only, likely to be something related to
> directory listings
>


History Server Refresh?

2016-04-11 Thread Miles Crawford
Hey there. I have my spark applications set up to write their event logs
into S3 - this is super useful for ephemeral clusters, I can have
persistent history even though my hosts go away.

A history server is set up to view this s3 location, and that works fine
too - at least on startup.

The problem is that the history server doesn't seem to notice new logs
arriving into the S3 bucket.  Any idea how I can get it to scan the folder
for new files?

Thanks,
-miles