Thanks for this very thorough write-up and for continuing to update it as you progress! As I said in the other thread it would be great to do a little profiling to see if we can get to the heart of the slowness with nested case classes (very little optimization has been done in this code path). If you can come up with a simple micro benchmark that shows its much slower using the case class API than with applySchema, I'd go ahead and open a JIRA.
On Thu, Aug 21, 2014 at 12:04 PM, Evan Chan <velvia.git...@gmail.com> wrote: > I just put up a repo with a write-up on how to import the GDELT public > dataset into Spark SQL and play around. Has a lot of notes on > different import methods and observations about Spark SQL. Feel free > to have a look and comment. > > http://www.github.com/velvia/spark-sql-gdelt > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >