Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-22 Thread Bedrytski Aliaksandr
Hi Everett, HiveContext is initialized only once as a lazy val, so if you mean initializing different jvms for each (or a group of) test(s), then in this case the context will not, obviously, be shared. But specs2 (by default) launches specs (inside of tests classes) in parallel threads and in

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-21 Thread Everett Anderson
On Sun, Aug 21, 2016 at 3:08 AM, Bedrytski Aliaksandr wrote: > Hi, > > we share the same spark/hive context between tests (executed in > parallel), so the main problem is that the temporary tables are > overwritten each time they are created, this may create race conditions >

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-21 Thread Bedrytski Aliaksandr
Hi, we share the same spark/hive context between tests (executed in parallel), so the main problem is that the temporary tables are overwritten each time they are created, this may create race conditions as these tempTables may be seen as global mutable shared state. So each time we create a

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-19 Thread Everett Anderson
Hi! Just following up on this -- When people talk about a shared session/context for testing like this, I assume it's still within one test class. So it's still the case that if you have a lot of test classes that test Spark-related things, you must configure your build system to not run in them

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-01 Thread Holden Karau
Thats a good point - there is an open issue for spark-testing-base to support this shared sparksession approach - but I haven't had the time ( https://github.com/holdenk/spark-testing-base/issues/123 ). I'll try and include this in the next release :) On Mon, Aug 1, 2016 at 9:22 AM, Koert Kuipers

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-01 Thread Koert Kuipers
we share a single single sparksession across tests, and they can run in parallel. is pretty fast On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson wrote: > Hi, > > Right now, if any code uses DataFrame/Dataset, I need a test setup that > brings up a local master as in

Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-01 Thread Everett Anderson
Hi, Right now, if any code uses DataFrame/Dataset, I need a test setup that brings up a local master as in this article . That's a lot of overhead for unit testing and the tests can't run in