Is there a processing speed difference between DataFrames and Datasets?

2016-11-22 Thread jggg777
I've seen a number of visuals showing the processing time benefits of using Datasets+DataFrames over RDDs, but I'd assume that there are performance benefits to using a defined case class instead a generic Dataset[Row]. The tale of three Spark APIs post mentions "If you want higher degree of

Pasting into spark-shell doesn't work for Databricks example

2016-11-21 Thread jggg777
I'm simply pasting in the UDAF example from this page and getting errors (basic EMR setup with Spark 2.0): https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html#04%20SQL,%20DataFrames%20%26%20Datasets/03%20UDF%20and%20UDAF%20-%20scala.html The imports appear to work, but then

Pasting oddity with Spark 2.0 (scala)

2016-11-14 Thread jggg777
This one has stumped the group here, hoping to get some insight into why this error is happening. I'm going through the Databricks DataFrames scala docs

Re: Newbie question - Best way to bootstrap with Spark

2016-11-10 Thread jggg777
A couple options: (1) You can start locally by downloading Spark to your laptop: http://spark.apache.org/downloads.html , then jump into the Quickstart docs: http://spark.apache.org/docs/latest/quick-start.html (2) There is a free Databricks community edition that runs on AWS: