On Mon, Oct 1, 2018 at 12:18 PM Girish Vasmatkar <
girish.vasmat...@hotwaxsystems.com> wrote:

> Hi All
>
> We are very early into our Spark days so the following may sound like a
> novice question :) I will try to keep this as short as possible.
>
> We are trying to use Spark to introduce a recommendation engine that can
> be used to provide product recommendations and need help on some design
> decisions before moving forward. Ours is a web application running on
> Tomcat. So far, I have created a simple POC (standalone java program) that
> reads in a CSV file and feeds to FPGrowth and then fits the data and runs
> transformations. I would like to be able to do the following -
>
>
>    - Scheduler runs nightly in Tomcat (which it does currently) and reads
>    everything from the DB to train/fit the system. This can grow into really
>    some large data and everyday we will have new data. Should I just use
>    SparkContext here, within my scheduler, to FIT the system? Is this correct
>    way to go about this? I am also planning to save the model on S3 which
>    should be okay. We also thought on using HDFS. The scheduler's job will be
>    just to create model and save the same and be done with it.
>    - On the product page, we can then use the saved model to display the
>    product recommendations for a particular product.
>    - My understanding is that I should be able to use SparkContext here
>    in my web application to just load the saved model and use it to derive the
>    recommendations. Is this a good design? The problem I see using this
>    approach is that the SparkContext does take time to initialize and this may
>    cost dearly. Or should we keep SparkContext per web application to use a
>    single instance of the same? We can initialize a SparkContext during
>    application context initializaion phase.
>
>
> Since I am fairly new to using Spark properly, please help me take
> decision on whether the way I plan to use Spark is the recommended way? I
> have also seen use cases involving kafka tha does communication with Spark,
> but can we not do it directly using Spark Context? I am sure a lot of my
> understanding is wrong, so please feel free to correct me.
>
> Thanks and Regards,
> Girish Vasmatkar
> HotWax Systems
>
>
>
>

Reply via email to