ge by Holden? It’s really
>>> useful for unit testing Spark apps as it handles all the bootstrapping for
>>> you.
>>>
>>> https://github.com/holdenk/spark-testing-base
>>>
>>> DataFrame examples are here:
>>> https://github.com/holdenk/spark
I'm trying to unit test a function that reads in a JSON file, manipulates
the DF and then returns a Scala Map.
The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
I've created a bootstrap spec for spark jobs that instantiates the Spark
Context
Hello,
I'm trying to load in an Avro file and write it out as Parquet. I would
like to have enough partitions to properly parallelize on. When I do the
simple load and save I get 1 partition out. I thought I would be able to
use repartition like the following:
val avroFile =