Parallelize independent tasks

2014-12-02 Thread Anselme Vignon
Hi folks,


We have written a spark job that scans multiple hdfs directories and
perform transformations on them.

For now, this is done with a simple for loop that starts one task at
each iteration. This looks like:

dirs.foreach { case (src,dest) = sc.textFile(src).process.saveAsFile(dest) }


However, each iteration is independent, and we would like to optimize
that by running
them with spark simultaneously (or in a chained fashion), such that we
don't have
idle executors at the end of each iteration (some directories
sometimes only require one partition)


Has anyone already done such a thing? How would you suggest we could do that?

Cheers,

Anselme

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit test failure: Address already in use

2014-06-18 Thread Anselme Vignon
Hi,

Could your problem come from the fact that you run your tests in parallel ?

If you are spark in local mode, you cannot have concurrent spark instances
running. this means that your tests instantiating sparkContext cannot be
run in parallel. The easiest fix is to tell sbt to not run parallel tests.
This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme




2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com:

 Hi,

 I have 3 unit tests (independent of each other) in the /src/test/scala
 folder. When I run each of them individually using: sbt test-only test,
 all the 3 pass the test. But when I run them all using sbt test, then
 they
 fail with the warning below. I am wondering if the binding exception
 results
 in failure to run the job, thereby causing the failure. If so, what can I
 do
 to address this binding exception? I am running these tests locally on a
 standalone machine (i.e. SparkContext(local, test)).


 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
 org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address
 already in use
 java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind0(Native Method)
 at sun.nio.ch.Net.bind(Net.java:174)
 at
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


 thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.