"Using Spark to query the data in the backend of the web UI?" Dont do that. I would recommend that spark streaming process stores data into some nosql or sql database and the web ui to query data from that database.
Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> 2016-09-29 16:15 GMT+02:00 Ali Akhtar <ali.rac...@gmail.com>: > The web UI is actually the speed layer, it needs to be able to query the > data online, and show the results in real-time. > > It also needs a custom front-end, so a system like Tableau can't be used, > it must have a custom backend + front-end. > > Thanks for the recommendation of Flume. Do you think this will work: > > - Spark Streaming to read data from Kafka > - Storing the data on HDFS using Flume > - Using Spark to query the data in the backend of the web UI? > > > > On Thu, Sep 29, 2016 at 7:08 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> You need a batch layer and a speed layer. Data from Kafka can be stored >> on HDFS using flume. >> >> - Query this data to generate reports / analytics (There will be a web >> UI which will be the front-end to the data, and will show the reports) >> >> This is basically batch layer and you need something like Tableau or >> Zeppelin to query data >> >> You will also need spark streaming to query data online for speed layer. >> That data could be stored in some transient fabric like ignite or even >> druid. >> >> HTH >> >> >> >> >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 29 September 2016 at 15:01, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >>> It needs to be able to scale to a very large amount of data, yes. >>> >>> On Thu, Sep 29, 2016 at 7:00 PM, Deepak Sharma <deepakmc...@gmail.com> >>> wrote: >>> >>>> What is the message inflow ? >>>> If it's really high , definitely spark will be of great use . >>>> >>>> Thanks >>>> Deepak >>>> >>>> On Sep 29, 2016 19:24, "Ali Akhtar" <ali.rac...@gmail.com> wrote: >>>> >>>>> I have a somewhat tricky use case, and I'm looking for ideas. >>>>> >>>>> I have 5-6 Kafka producers, reading various APIs, and writing their >>>>> raw data into Kafka. >>>>> >>>>> I need to: >>>>> >>>>> - Do ETL on the data, and standardize it. >>>>> >>>>> - Store the standardized data somewhere (HBase / Cassandra / Raw HDFS >>>>> / ElasticSearch / Postgres) >>>>> >>>>> - Query this data to generate reports / analytics (There will be a web >>>>> UI which will be the front-end to the data, and will show the reports) >>>>> >>>>> Java is being used as the backend language for everything (backend of >>>>> the web UI, as well as the ETL layer) >>>>> >>>>> I'm considering: >>>>> >>>>> - Using raw Kafka consumers, or Spark Streaming, as the ETL layer >>>>> (receive raw data from Kafka, standardize & store it) >>>>> >>>>> - Using Cassandra, HBase, or raw HDFS, for storing the standardized >>>>> data, and to allow queries >>>>> >>>>> - In the backend of the web UI, I could either use Spark to run >>>>> queries across the data (mostly filters), or directly run queries against >>>>> Cassandra / HBase >>>>> >>>>> I'd appreciate some thoughts / suggestions on which of these >>>>> alternatives I should go with (e.g, using raw Kafka consumers vs Spark for >>>>> ETL, which persistent data store to use, and how to query that data store >>>>> in the backend of the web UI, for displaying the reports). >>>>> >>>>> >>>>> Thanks. >>>>> >>>> >>> >> >