Hello,

As for use cases, in my old job at Ericsson we were building a
streaming system that was processing data from telephone networks, and
it was using key-value stores a LOT. For example, keeping track of
various state info of the users (which cell are they currently
connected to, what bearers do they have, ...); mapping from IDs of
users in one subsystem of the telephone network to the IDs of the same
users in an other subsystem; mapping from IDs of phone calls to lists
of IDs of participating users; etc.
So I imagine they would like this a lot. (At least, if they were
considering moving to Flink :))

Best,
Gabor




2015-09-08 13:35 GMT+02:00 Gyula Fóra <gyf...@apache.org>:
> Hey All,
>
> The last couple of days I have been playing around with the idea of
> building a streaming key-value store abstraction using stateful streaming
> operators that can be used within Flink Streaming programs seamlessly.
>
> Operations executed on this KV store would be fault tolerant as it
> integrates with the checkpointing mechanism, and if we add timestamps to
> each put/get/... operation we can use the watermarks to create fully
> deterministic results. This functionality is very useful for many
> applications, and is very hard to implement properly with some dedicates kv
> store.
>
> The KVStore abstraction could look as follows:
>
> KVStore<K,V> store = new KVStore<>;
>
> Operations:
>
> store.put(DataStream<Tuple2<K,V>>)
> store.get(DataStream<K>) -> DataStream<KV<K,V>>
> store.remove(DataStream<K>) -> DataStream<KV<K,V>>
> store.multiGet(DataStream<K[]>) -> DataStream<KV<K,V>[]>
> store.getWithKeySelector(DataStream<X>, KeySelector<X,K>) ->
> DataStream<KV<X,V>[]>
>
> For the resulting streams I used a special KV abstraction which let's us
> return null values.
>
> The implementation uses a simple streaming operator for executing most of
> the operations (for multi get there is an additional merge operator) with
> either local or partitioned states for storing the kev-value pairs (my
> current prototype uses local states). And it can either execute operations
> eagerly (which would not provide deterministic results), or buffer up
> operations and execute them in order upon watermarks.
>
> As for use cases you can probably come up with many I will save that for
> now :D
>
> I have a prototype implementation here that can execute the operations
> described above (does not handle watermarks and time yet):
>
> https://github.com/gyfora/flink/tree/KVStore
> And also an example job:
>
> https://github.com/gyfora/flink/blob/KVStore/flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/KVStreamExample.java
>
> What do you think?
> If you like it I will work on writing tests and it still needs a lot of
> tweaking and refactoring. This might be something we want to include with
> the standard streaming libraries at one point.
>
> Cheers,
> Gyula

Reply via email to