Re: Unit test with sqlContext

2016-02-05 Thread Steve Annessa
ge by Holden? It’s really >>> useful for unit testing Spark apps as it handles all the bootstrapping for >>> you. >>> >>> https://github.com/holdenk/spark-testing-base >>> >>> DataFrame examples are here: >>> https://github.com/holdenk/spark

Unit test with sqlContext

2016-02-04 Thread Steve Annessa
I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map. The function has signature: def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) I've created a bootstrap spec for spark jobs that instantiates the Spark Context

DataFrame repartition not repartitioning

2015-09-16 Thread Steve Annessa
Hello, I'm trying to load in an Avro file and write it out as Parquet. I would like to have enough partitions to properly parallelize on. When I do the simple load and save I get 1 partition out. I thought I would be able to use repartition like the following: val avroFile =