Unit testing: Mocking out Spark classes

2014-10-16 Thread Saket Kumar
Hello all,

I am trying to unit test my classes involved my Spark job. I am trying to
mock out the Spark classes (like SparkContext and Broadcast) so that I can
unit test my classes in isolation. However I have realised that these are
classes instead of traits. My first question is why?

It is quite hard to mock out classes using ScalaTest+ScalaMock as the
classes which need to be mocked out need to be annotated with
org.scalamock.annotation.mock as per
http://www.scalatest.org/user_guide/testing_with_mock_objects#generatedMocks.
I cannot do that in my case as I am trying to mock out the spark classes.

Am I missing something? Is there a better way to do this?

val sparkContext = mock[SparkInteraction]
val trainingDatasetLoader = mock[DatasetLoader]
val broadcastTrainingDatasetLoader = mock[Broadcast[DatasetLoader]]
def transformerFunction(source: Iterator[(HubClassificationData,
String)]): Iterator[String] = {
  source.map(_._2)
}
val classificationResultsRDD = mock[RDD[String]]
val classificationResults = Array(,,)
val inputRDD = mock[RDD[(HubClassificationData, String)]]

inSequence{
  inAnyOrder{
(sparkContext.broadcast[DatasetLoader]
_).expects(trainingDatasetLoader).returns(broadcastTrainingDatasetLoader)
  }
}

val sparkInvoker = new SparkJobInvoker(sparkContext,
trainingDatasetLoader)

when(inputRDD.mapPartitions(transformerFunction)).thenReturn(classificationResultsRDD)
sparkInvoker.invoke(inputRDD)

Thanks,
Saket


Re: Unit testing: Mocking out Spark classes

2014-10-16 Thread Daniel Siegmann
Mocking these things is difficult; executing your unit tests in a local
Spark context is preferred, as recommended in the programming guide
http://spark.apache.org/docs/latest/programming-guide.html#unit-testing.
I know this may not technically be a unit test, but it is hopefully close
enough.

You can load your test data using SparkContext.parallelize and retrieve the
data (for verification) using RDD.collect.

On Thu, Oct 16, 2014 at 9:07 AM, Saket Kumar saket.ku...@bgch.co.uk wrote:

 Hello all,

 I am trying to unit test my classes involved my Spark job. I am trying to
 mock out the Spark classes (like SparkContext and Broadcast) so that I can
 unit test my classes in isolation. However I have realised that these are
 classes instead of traits. My first question is why?

 It is quite hard to mock out classes using ScalaTest+ScalaMock as the
 classes which need to be mocked out need to be annotated with
 org.scalamock.annotation.mock as per
 http://www.scalatest.org/user_guide/testing_with_mock_objects#generatedMocks.
 I cannot do that in my case as I am trying to mock out the spark classes.

 Am I missing something? Is there a better way to do this?

 val sparkContext = mock[SparkInteraction]
 val trainingDatasetLoader = mock[DatasetLoader]
 val broadcastTrainingDatasetLoader = mock[Broadcast[DatasetLoader]]
 def transformerFunction(source: Iterator[(HubClassificationData,
 String)]): Iterator[String] = {
   source.map(_._2)
 }
 val classificationResultsRDD = mock[RDD[String]]
 val classificationResults = Array(,,)
 val inputRDD = mock[RDD[(HubClassificationData, String)]]

 inSequence{
   inAnyOrder{
 (sparkContext.broadcast[DatasetLoader]
 _).expects(trainingDatasetLoader).returns(broadcastTrainingDatasetLoader)
   }
 }

 val sparkInvoker = new SparkJobInvoker(sparkContext,
 trainingDatasetLoader)

 when(inputRDD.mapPartitions(transformerFunction)).thenReturn(classificationResultsRDD)
 sparkInvoker.invoke(inputRDD)

 Thanks,
 Saket




-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io W: www.velos.io