Hi All,

               Just started understanding / getting hands on with Spark, 
Streaming and MLLIb. We are in the design phase and need suggestions on the 
training data storage requirement.

Batch Layer: Our core systems generate data which we will be using as batch 
data, currently SQL server is being used by core systems. Our requirement is to 
pull data from core databases and transform the data using spark job and store 
it into Cassandra. Train the model by pulling data from Cassandra and store the 
prediction results in the Cassandra itself.

Real time Layer: We are also planning have real time layer which stores live 
data from devices to Cassandra for further analysis using MLLib.

Heard that there is no need of Cassandra in this design as Spark itself 
provides storage. Please provide suggestions whether Cassandra is required or 
not and also suggest best way to handle:

        [cid:image002.png@01D11E32.A9B207B0]




Aruna Veluru | Senior Lead Analyst | Bally 
Technologies<http://www.ballytech.com>  | (O) +1 702 532 2832 | (M) +91 99 7222 
6213

May be privileged. May be confidential. Please delete if not the addressee.
Veluru Veluru

Attachment: image001.emz
Description: image001.emz

Reply via email to