There is no one size fits all solution available in the market today. If
somebody tell you they do then they are simply lying :)

Both solutions cater to different set of problems. My recommendation is to
put real focus on getting better understanding of your problems that you
are trying to solve with Spark and Redshift and pick tool based on how
effectively they handle those problems. Like Matei said, both might be
relevant in some cases.

Thanks
Akshar


On Tue, Nov 4, 2014 at 4:00 PM, Jimmy McErlain <ji...@sellpoints.com> wrote:

> This is pretty spot on.. though I would also add that the Spark features
> that it touts around speed are all dependent on caching the data into
> memory... reading off the disk still takes time..ie pulling the data into
> an RDD.  This is the reason that Spark is great for ML... the data is used
> over and over again to fit models so its pulled into memory once then
> basically analyzed through the algos... other DBs systems are reading and
> writing to disk repeatedly and are thus slower, such as mahout (though its
> getting ported over to Spark as well to compete with MLlib)...
>
> J
> ᐧ
>
>
>
>
> *JIMMY MCERLAIN*
>
> DATA SCIENTIST (NERD)
>
> *. . . . . . . . . . . . . . . . . .*
>
>
> *IF WE CAN’T DOUBLE YOUR SALES,*
>
>
>
> *ONE OF US IS IN THE WRONG BUSINESS.*
>
> *E*: ji...@sellpoints.com
>
> *M*: *510.303.7751 <510.303.7751>*
>
> On Tue, Nov 4, 2014 at 3:51 PM, Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
>
>> Is this about Spark SQL vs Redshift, or Spark in general? Spark in
>> general provides a broader set of capabilities than Redshift because it has
>> APIs in general-purpose languages (Java, Scala, Python) and libraries for
>> things like machine learning and graph processing. For example, you might
>> use Spark to do the ETL that will put data into a database such as
>> Redshift, or you might pull data out of Redshift into Spark for machine
>> learning. On the other hand, if *all* you want to do is SQL and you are
>> okay with the set of data formats and features in Redshift (i.e. you can
>> express everything using its UDFs and you have a way to get data in), then
>> Redshift is a complete service which will do more management out of the box.
>>
>> Matei
>>
>> > On Nov 4, 2014, at 3:11 PM, agfung <agf...@gmail.com> wrote:
>> >
>> > I'm in the midst of a heated debate about the use of Redshift v Spark
>> with a
>> > colleague.  We keep trading anecdotes and links back and forth (eg
>> airbnb
>> > post from 2013 or amplab benchmarks), and we don't seem to be getting
>> > anywhere.
>> >
>> > So before we start down the prototype /benchmark road, and in
>> desperation
>> > of finding *some* kind of objective third party perspective,  was
>> wondering
>> > if anyone who has used both in 2014 would care to provide commentary
>> about
>> > the sweet spot use cases / gotchas for non trivial use (eg a simple
>> filter
>> > scan isn't really interesting).  Soft issues like operational
>> maintenance
>> > and time spent developing v out of the box are interesting too...
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-tp18112.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Akshar Dave
Principal – Big Data
SoftNet Solutions
Office: 408.542.0888 | Mobile: 408.896.1486
940 Hamlin Court, Sunnyvale, CA 94089
www.softnets.com/bigdata

Reply via email to