Re: Writeup on Spark SQL with GDELT

Michael Armbrust Mon, 25 Aug 2014 12:42:06 -0700

Thanks for this very thorough write-up and for continuing to update it as
you progress!  As I said in the other thread it would be great to do a
little profiling to see if we can get to the heart of the slowness with
nested case classes (very little optimization has been done in this code
path).  If you can come up with a simple micro benchmark that shows its
much slower using the case class API than with applySchema, I'd go ahead
and open a JIRA.



On Thu, Aug 21, 2014 at 12:04 PM, Evan Chan <velvia.git...@gmail.com> wrote:

> I just put up a repo with a write-up on how to import the GDELT public
> dataset into Spark SQL and play around.  Has a lot of notes on
> different import methods and observations about Spark SQL.   Feel free
> to have a look and comment.
>
> http://www.github.com/velvia/spark-sql-gdelt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Writeup on Spark SQL with GDELT

Reply via email to