Managing Flink on Yarn programmatically

2016-05-16 Thread John Sherwood
I think this is primarily a shortcoming in my ability to grep through Scala
efficiently, but are there any resources on how to programmatically spin up
& administrate Flink jobs on YARN? The CLI naturally works, but I'd like to
build out a service handling the nuances of job management rather than rely
on users playing with the YARN or Flink CLIs directly.


Configuring taskmanager with HA jobmanager

2016-05-04 Thread John Sherwood
I'm attempting to move to an HA configuration with a trio of JobManagers on
top of a ZK cluster. From the docs, it appears that I should have them in
my 'masters' file (as I do), but when I attempt to start the TaskManagers,
they die complaining there is no jobmanager.rpc.address config - which
seems counter to the purpose of having the masters. Is there some other
setting I'm missing to tell the taskmanagers to use the masters file?


S3 Checkpoint Storage

2016-05-02 Thread John Sherwood
Hello all,

I'm attempting to set up a taskmanager cluster using S3 as the
highly-available store. It looks like the main thing is just setting the `
state.backend.fs.checkpointdir` to the appropriate s3:// URI, but as
someone rather new to accessing S3 from Java, how should I provide Flink
with the credentials necessary to access the bucket I've set up?

Thanks for your time & any help!


Re: design question

2016-04-23 Thread John Sherwood
This sounds like you have some per-key state to keep track of, so the
'correct' way to do it would be to keyBy the guid. I believe that if you
run your environment using the Rocks DB state backend you will not OOM
regardless of the number of GUIDs that are eventually tracked. Whether
flink/stream processing is the most effective way to achieve your goal, I
can't say, but I am fairly confident that this particular aspect is not a
problem.

On Sat, Apr 23, 2016 at 1:13 AM, Chen Bekor  wrote:

> hi all,
>
> I have a stream of incoming object versions (objects change over time) and
> a requirement to fetch from a datastore the last known object version in
> order to link it with the id of the new version,  so that I end up with a
> linked list of object versions.
>
> all object versions contain the same guid, so I was thinking about using
> flink streaming in order to assure ordering and avoid concurrency / race
> conditions in the linkage process (object version might arrive unordered or
> may arrive at spikes)
>
> if I use the object guid as a key for a keyed stream I am concerned I will
> end up with millions of windowed streams hence causing OOM.
>
> what do you think should be the right approach? do you think flink is the
> right technology for this task?
>


Re: implementing a continuous time window

2016-04-21 Thread John Sherwood
You are looking for sliding windows:
https://flink.apache.org/news/2015/12/04/Introducing-windows.html

Here you would do

.timeWindow(Time.seconds(5), Time.seconds(1))

On Thu, Apr 21, 2016 at 12:06 PM, Jonathan Yom-Tov 
wrote:

> hi,
>
> Is it possible to implement a continuous time window with flink? Here's an
> example. Say I want to count events within a window. The window length is 5
> seconds and I get events at t = 1, 2, 7, 8 seconds. I would then expect to
> get events with a count at t = 1 (count = 1), t = 2 (count = 2), t = 6
> (count = 1), t = 7 (count = 2), t = 8 (count = 2), t = 12 (count = 1) and t
> = 13 (count = 0).
>
> How would I go about doing that?.
>
> thanks,
> Jon.
>