Parallelize independent tasks

2014-12-02 Thread Anselme Vignon
Hi folks, We have written a spark job that scans multiple hdfs directories and perform transformations on them. For now, this is done with a simple for loop that starts one task at each iteration. This looks like: dirs.foreach { case (src,dest) = sc.textFile(src).process.saveAsFile(dest) }

Re: Unit test failure: Address already in use

2014-06-18 Thread Anselme Vignon
Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel