Hello Druid developers,

We are a team of master students and we are considering to develop a Flink 
Connector for Druid as part of a university project.

After a couple of days of learning the druid basics we are thinking about a 
suitable architecture and are trying to figure out ways to access the data. 

For the implementation of the source, we started thinking about projection, 
selection and scan queries. 
Since Flink is potentially running lots of source tasks in parallel on multiple 
nodes, we'd like to directly access the data via the segments that can be 
received from the metadata through the broker, if our understanding is correct. 
However, searching through the API of Druid we didn't find any calls that go 
into the direction of directly accessing segments to read from them. 

We currently see two options: Either we push the query down to druid and 
intercept at the point where the data servers are consulted to collect the data 
(is this possible?). Or we execute the query on our own by talking to the 
different druid processes to get the metadata with relevant segments, hosts, 
etc.

So the question to you would be: Is there any mechanism in Druid that is 
intended for external systems to directly access the data in one way or 
another? Or do you have any other alternatives that would be useful in our 
case? How do other systems that have a druid source handle this problem? (We 
saw presto offers a druid source).

Another interesting question for us would be if Druid has some change data 
capture/changelogs that are consistently updated with the changes to the 
system. This would be an interesting case for an unbounded data stream source.

We appreciate the help!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to