Hello Druid developers, We are a team of master students and we are considering to develop a Flink Connector for Druid as part of a university project.
After a couple of days of learning the druid basics we are thinking about a suitable architecture and are trying to figure out ways to access the data. For the implementation of the source, we started thinking about projection, selection and scan queries. Since Flink is potentially running lots of source tasks in parallel on multiple nodes, we'd like to directly access the data via the segments that can be received from the metadata through the broker, if our understanding is correct. However, searching through the API of Druid we didn't find any calls that go into the direction of directly accessing segments to read from them. We currently see two options: Either we push the query down to druid and intercept at the point where the data servers are consulted to collect the data (is this possible?). Or we execute the query on our own by talking to the different druid processes to get the metadata with relevant segments, hosts, etc. So the question to you would be: Is there any mechanism in Druid that is intended for external systems to directly access the data in one way or another? Or do you have any other alternatives that would be useful in our case? How do other systems that have a druid source handle this problem? (We saw presto offers a druid source). Another interesting question for us would be if Druid has some change data capture/changelogs that are consistently updated with the changes to the system. This would be an interesting case for an unbounded data stream source. We appreciate the help! --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org