iamruhua opened a new issue #10704:
URL: https://github.com/apache/druid/issues/10704


   ### Description
   We are currently working on a IoT project which will integrate a large 
number of devices, and save the raw data(which is semi-structured) and do some 
simple statistical or analytical  aggregation at realtime. There would be 
millions of devices from hundreds of zone,  connected to the project(High 
cardinality)
   
   Currently we tried to use kafka ingestion, which druid will pull raw data 
and save them. And another pulling procedure will pull the kafka again to 
aggregate based on some dimention( zoneID, typeOfDevice 
   
   ### Motivation
   So the problem(might be) is, we have to pull kafka multiple times for 
different purpose but identical dataset.
   Is there any way we could add or configure a pipeline for this kind of 
process?
   Like we received device reading from deviceA of zoneA, save reading data 
into a table named raw_data. And then use the same data to calculate the 
sum(reading.data) to save in a aggregated table sum_by_zone?
   
   This could save a lot of bandwidth and computing resources.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to