Re: History Server Refresh?

2016-04-12 Thread Miles Crawford
It is completed apps that are not showing up. I'm fine with incomplete apps
not appearing.

On Tue, Apr 12, 2016 at 6:43 AM, Steve Loughran 
wrote:

>
> On 12 Apr 2016, at 00:21, Miles Crawford  wrote:
>
> Hey there. I have my spark applications set up to write their event logs
> into S3 - this is super useful for ephemeral clusters, I can have
> persistent history even though my hosts go away.
>
> A history server is set up to view this s3 location, and that works fine
> too - at least on startup.
>
> The problem is that the history server doesn't seem to notice new logs
> arriving into the S3 bucket.  Any idea how I can get it to scan the folder
> for new files?
>
> Thanks,
> -miles
>
>
> s3 isn't a real filesystem, and apps writing to it don't have any data
> written until one of
>  -the output stream is close()'d. This happens at the end of the app
>  -the file is set up to be partitioned and a partition size is crossed
>
> Until either of those conditions are met, the history server isn't going
> to see anything.
>
> If you are going to use s3 as the dest, and you want to see incomplete
> apps, then you'll need to configure the spark job to have smaller partition
> size (64? 128? MB).
>
> If it's completed apps that aren't being seen by the HS, then that's a
> bug, though if its against s3 only, likely to be something related to
> directory listings
>


Re: History Server Refresh?

2016-04-12 Thread Steve Loughran

On 12 Apr 2016, at 00:21, Miles Crawford 
> wrote:

Hey there. I have my spark applications set up to write their event logs into 
S3 - this is super useful for ephemeral clusters, I can have persistent history 
even though my hosts go away.

A history server is set up to view this s3 location, and that works fine too - 
at least on startup.

The problem is that the history server doesn't seem to notice new logs arriving 
into the S3 bucket.  Any idea how I can get it to scan the folder for new files?

Thanks,
-miles

s3 isn't a real filesystem, and apps writing to it don't have any data written 
until one of
 -the output stream is close()'d. This happens at the end of the app
 -the file is set up to be partitioned and a partition size is crossed

Until either of those conditions are met, the history server isn't going to see 
anything.

If you are going to use s3 as the dest, and you want to see incomplete apps, 
then you'll need to configure the spark job to have smaller partition size (64? 
128? MB).

If it's completed apps that aren't being seen by the HS, then that's a bug, 
though if its against s3 only, likely to be something related to directory 
listings


History Server Refresh?

2016-04-11 Thread Miles Crawford
Hey there. I have my spark applications set up to write their event logs
into S3 - this is super useful for ephemeral clusters, I can have
persistent history even though my hosts go away.

A history server is set up to view this s3 location, and that works fine
too - at least on startup.

The problem is that the history server doesn't seem to notice new logs
arriving into the S3 bucket.  Any idea how I can get it to scan the folder
for new files?

Thanks,
-miles