We have a small mesos cluster and these slaves need to have a vfs setup on them
so that the slaves can pull down the data they need from S3 when spark runs.
There doesn’t seem to be any obvious way online on how to do this or how easily
accomplish this. Does anyone have some best practices or some ideas about how
to accomplish this?
An example stack trace when a job is ran on the mesos cluster…
Any idea how to get this going? Like somehow bootstrapping spark on run or
something?
Thanks,
Steve
java.io.IOException: Unsupported scheme s3n for URI s3n://removed
at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
at
com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
at
com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
at
com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
at
com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/05/12 13:57:51 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID 1)
java.lang.RuntimeException: java.io.IOException: Unsupported scheme s3n for URI
s3n://removed
at
com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:307)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unsupported scheme s3n for URI s3n://removed
at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
at
com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
at
com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
at
com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
at
com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
... 8 more
This e-mail is intended solely for the above-mentioned recipient and it may
contain confidential or privileged information. If you have received it in
error, please notify us immediately and delete the e-mail. You must not copy,
distribute, disclose or take any action in reliance on it. In addition, the
contents of an attachment to this e-mail may contain software viruses which
could damage your own computer system. While ColdLight Solutions, LLC has taken
every reasonable precaution to minimize this risk, we cannot accept liability
for any damage which you sustain as a result of software viruses. You should
perform your own virus checks before opening the attachment.