Re: DB I strongly encourage you to look at Cassandra – it’s almost as powerful 
as Hbase, a lot easier to setup and manage. Well suited for this type of 
usecase, with a combination of K/V store and time series data.

For the second question, I’ve used this pattern all the time for “flash 
messages” - passing info as a 1 time message downstream:

  *   In your updateStateByKey function, emit a tuple of (actualNewState, 
changedData)
  *   Then filter this on !changedData.isEmpty or something
  *   And only do foreachRdd on the filtered stream.

Makes sense?

-adrian

From: Thúy Hằng Lê
Date: Friday, September 25, 2015 at 10:31 AM
To: ALEX K
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: Using Spark for portfolio manager app


Thanks all for the feedback so far.
I havn't decided which external storage will be used yet.
HBase is cool but it requires Hadoop in production. I only have 3-4 servers for 
the whole things ( i am thinking of a relational database for this, can be 
MariaDB, Memsql or mysql) but they are hard to scale.
I will try various appoaches before making any decision.

In addition, using Spark Streaming is there any way to update only new data to 
external storage after using updateStateByKey?
The foreachRDD function seems to loop over all RDDs( includes one that havent 
changed) i believe Spark streamming must has a way to do it, but i still 
couldn't find an example doing similar job.

Reply via email to