Managing Flink on Yarn programmatically
I think this is primarily a shortcoming in my ability to grep through Scala efficiently, but are there any resources on how to programmatically spin up & administrate Flink jobs on YARN? The CLI naturally works, but I'd like to build out a service handling the nuances of job management rather than rely on users playing with the YARN or Flink CLIs directly.
Configuring taskmanager with HA jobmanager
I'm attempting to move to an HA configuration with a trio of JobManagers on top of a ZK cluster. From the docs, it appears that I should have them in my 'masters' file (as I do), but when I attempt to start the TaskManagers, they die complaining there is no jobmanager.rpc.address config - which seems counter to the purpose of having the masters. Is there some other setting I'm missing to tell the taskmanagers to use the masters file?
S3 Checkpoint Storage
Hello all, I'm attempting to set up a taskmanager cluster using S3 as the highly-available store. It looks like the main thing is just setting the ` state.backend.fs.checkpointdir` to the appropriate s3:// URI, but as someone rather new to accessing S3 from Java, how should I provide Flink with the credentials necessary to access the bucket I've set up? Thanks for your time & any help!
Re: design question
This sounds like you have some per-key state to keep track of, so the 'correct' way to do it would be to keyBy the guid. I believe that if you run your environment using the Rocks DB state backend you will not OOM regardless of the number of GUIDs that are eventually tracked. Whether flink/stream processing is the most effective way to achieve your goal, I can't say, but I am fairly confident that this particular aspect is not a problem. On Sat, Apr 23, 2016 at 1:13 AM, Chen Bekor wrote: > hi all, > > I have a stream of incoming object versions (objects change over time) and > a requirement to fetch from a datastore the last known object version in > order to link it with the id of the new version, so that I end up with a > linked list of object versions. > > all object versions contain the same guid, so I was thinking about using > flink streaming in order to assure ordering and avoid concurrency / race > conditions in the linkage process (object version might arrive unordered or > may arrive at spikes) > > if I use the object guid as a key for a keyed stream I am concerned I will > end up with millions of windowed streams hence causing OOM. > > what do you think should be the right approach? do you think flink is the > right technology for this task? >
Re: implementing a continuous time window
You are looking for sliding windows: https://flink.apache.org/news/2015/12/04/Introducing-windows.html Here you would do .timeWindow(Time.seconds(5), Time.seconds(1)) On Thu, Apr 21, 2016 at 12:06 PM, Jonathan Yom-Tov wrote: > hi, > > Is it possible to implement a continuous time window with flink? Here's an > example. Say I want to count events within a window. The window length is 5 > seconds and I get events at t = 1, 2, 7, 8 seconds. I would then expect to > get events with a count at t = 1 (count = 1), t = 2 (count = 2), t = 6 > (count = 1), t = 7 (count = 2), t = 8 (count = 2), t = 12 (count = 1) and t > = 13 (count = 0). > > How would I go about doing that?. > > thanks, > Jon. >