So when deciding whether to take on installing/configuring Spark, the size of the data does not automatically make that decision in your mind.
Thanks, Gary On Thu, Feb 26, 2015 at 8:55 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi > > On Fri, Feb 27, 2015 at 10:50 AM, Gary Malouf <malouf.g...@gmail.com> > wrote: > >> The honest answer is that it is unclear to me at this point. I guess >> what I am really wondering is if there are cases where one would find it >> beneficial to use Spark against one or more RDBs? >> > > Well, RDBs are all about *storage*, while Spark is about *computation*. If > you have a very expensive computation (that can be parallelized in some > way), then you might want to use Spark, even though your data lives in an > ordinary RDB. Think raytracing, where you do something "for every pixel in > the output image" and you could get your scene description from a database, > write the result to a database, but use Spark to do two minutes of > calculation for every pixel in parallel (or so). > > Tobias > > > >