GitHub user StefanRRichter opened a pull request: https://github.com/apache/flink/pull/5424
FLINK-8571] [DataStream] Introduce utility function that reinterprets a data stream as keyed stream ## What is the purpose of the change This change introduces a utility function (`DataStreamUtils#reinterpretAsKeyedStream(...)`) that re-interprets any data stream as a keyed stream. Currently, the intended use-case are source that are already pre-partitioned w.r.t. the key group partitioning of Flink's keyBy. With this, for materialized shuffles that are picked up through a re-interpreted source, a job that uses keyed state can become embarrassingly parallel. This, in turn, allows for fine-grained recovery options where tasks can still make progress if other tasks fail. ## Brief change log - Moved `DataStreamUtils` to a better package. - Introduced utility functions to re-interpret data streams as keyed in `DataStreamUtils` and the counterpart from the Scala API. ## Verifying this change See `DataStreamUtilsTest#testReinterpretAsKeyedStream`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (Docs + JavaDocs) You can merge this pull request into a Git repository by running: $ git pull https://github.com/StefanRRichter/flink key-partitioned-source Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5424 ---- commit 45919ffc7ca064ba612c0493b93694111d18d430 Author: Stefan Richter <s.richter@...> Date: 2018-02-07T10:04:14Z Move DataStreamUtils to the datastream API package so that we can actually use it to expose package-private constructors or methods for experimental features. commit 8f9a2d78fbe0d97cf7c4997eba539a963ba7aee4 Author: Stefan Richter <s.richter@...> Date: 2018-02-02T17:39:51Z [FLINK-8571] [DataStream] Introduce utility function that reinterprets a data stream as keyed stream ---- ---