Hi, I am am evaluating my options for a project that injects a rich data feed, does some aggregate calculations and allows the user to query on these.
The (protobuf) data feed is rich in the sense that it contains several data fields which can be used to calculate several different KPI figures. The KPIs are not related. I would like to explore the possibility of doing this work as data comes in using Spark Streaming. Any examples I've seen and my gut tells me that the Spark Stream apps should be kept simple.. one data metric is processed in one "pipeline" and persisted at the end. In my case I would need to ingest the rich data and fork into several pipelines, each calculating a different KPI and then persist them all at the end as one transaction. Am I right in thinking that this complexity and aggregation work would be better placed in separate offline Spark jobs? Any feedback would be much appreciated, thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Stream-suitability-tp23852.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org