Hi, I've very recently come upon flink and I'm trying to use it to solve a problem that I have.
I have a stream of User Settings updates coming through kafka queue. I need to store the most recent settings along with a history of settings for each user in redshift which then feeds into analytics dashboards. I've been contemplating using Flink for this problem. I wanted some guidance from people experienced in Flink to help me decide if Flink is suited to this problem and if so what approach might work best. I am considering the following approaches: 1. Create a secondary key-value database with the users latest settings and lookup these settings after grouping the stream byKey(userId) to check if a setting has changed and if so create a history record. I came across this stackoverflow thread: http://stackoverflow.com/questions/38866078/how-to-look-up-and-update-the-state-of-a-record-from-a-database-in-apache-flink to help with this approach. 2. Pull the current snapshot of users from redshift and keep it as state in Flink program at program start (the snapshot isn't huge ~1GB). Subsequently lookup from this state and update it when processing events. In both these cases I plan to create a Redshift sink that batches updates to history as well as latest state and persists to redshift by batches (through s3 and copy command for history, through a update on join for snapshot). Is one of these designs more suited to working with Flink? Is there an alternative I should consider? Thanks! -H