Dear folks, DISCLAIMER: With this mail, my sole intention is to establish contact with the community and trade ideas on how to realize the goal described below.
I'm a starting PhD researcher in distributed systems and databases who is particularly interested in worst-case optimal (multiway) join processing on streams. I have performed preliminary tests with a new join algorithm that shows rather promising results. However, the limitation is that the algorithm operates in a centralized fashion. My goal is to extend the capabilities of the algorithm to operate in a distributed environment. To showcase my results, I want to implement a proof-of-concept in Apache Flink. I know this is a rather ambitious project, hence why I am reaching out to the community. I have traversed most of the application development documentation on the website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the internals thereof. Specifically, I want to gain some more insights in the lifecycle of a query in Flink. Is there some additional documentation available on this subject? Thanks in advance. [1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html [2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html [3] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html [4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals Kind regards, Laurens Vijnck