Re: Architecture recommendations for a tricky use case

Alonso Isidoro Roman Thu, 29 Sep 2016 07:30:56 -0700

"Using Spark to query the data in the backend of the web UI?"

Dont do that. I would recommend that spark streaming process stores data
into some nosql or sql database and the web ui to query data from that
database.


Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
<https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>

2016-09-29 16:15 GMT+02:00 Ali Akhtar <ali.rac...@gmail.com>:

> The web UI is actually the speed layer, it needs to be able to query the
> data online, and show the results in real-time.
>
> It also needs a custom front-end, so a system like Tableau can't be used,
> it must have a custom backend + front-end.
>
> Thanks for the recommendation of Flume. Do you think this will work:
>
> - Spark Streaming to read data from Kafka
> - Storing the data on HDFS using Flume
> - Using Spark to query the data in the backend of the web UI?
>
>
>
> On Thu, Sep 29, 2016 at 7:08 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> You need a batch layer and a speed layer. Data from Kafka can be stored
>> on HDFS using flume.
>>
>> -  Query this data to generate reports / analytics (There will be a web
>> UI which will be the front-end to the data, and will show the reports)
>>
>> This is basically batch layer and you need something like Tableau or
>> Zeppelin to query data
>>
>> You will also need spark streaming to query data online for speed layer.
>> That data could be stored in some transient fabric like ignite or even
>> druid.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 29 September 2016 at 15:01, Ali Akhtar <ali.rac...@gmail.com> wrote:
>>
>>> It needs to be able to scale to a very large amount of data, yes.
>>>
>>> On Thu, Sep 29, 2016 at 7:00 PM, Deepak Sharma <deepakmc...@gmail.com>
>>> wrote:
>>>
>>>> What is the message inflow ?
>>>> If it's really high , definitely spark will be of great use .
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Sep 29, 2016 19:24, "Ali Akhtar" <ali.rac...@gmail.com> wrote:
>>>>
>>>>> I have a somewhat tricky use case, and I'm looking for ideas.
>>>>>
>>>>> I have 5-6 Kafka producers, reading various APIs, and writing their
>>>>> raw data into Kafka.
>>>>>
>>>>> I need to:
>>>>>
>>>>> - Do ETL on the data, and standardize it.
>>>>>
>>>>> - Store the standardized data somewhere (HBase / Cassandra / Raw HDFS
>>>>> / ElasticSearch / Postgres)
>>>>>
>>>>> - Query this data to generate reports / analytics (There will be a web
>>>>> UI which will be the front-end to the data, and will show the reports)
>>>>>
>>>>> Java is being used as the backend language for everything (backend of
>>>>> the web UI, as well as the ETL layer)
>>>>>
>>>>> I'm considering:
>>>>>
>>>>> - Using raw Kafka consumers, or Spark Streaming, as the ETL layer
>>>>> (receive raw data from Kafka, standardize & store it)
>>>>>
>>>>> - Using Cassandra, HBase, or raw HDFS, for storing the standardized
>>>>> data, and to allow queries
>>>>>
>>>>> - In the backend of the web UI, I could either use Spark to run
>>>>> queries across the data (mostly filters), or directly run queries against
>>>>> Cassandra / HBase
>>>>>
>>>>> I'd appreciate some thoughts / suggestions on which of these
>>>>> alternatives I should go with (e.g, using raw Kafka consumers vs Spark for
>>>>> ETL, which persistent data store to use, and how to query that data store
>>>>> in the backend of the web UI, for displaying the reports).
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>
>>
>

Re: Architecture recommendations for a tricky use case

Reply via email to