On Fri, Feb 5, 2016 at 12:58 PM, Gerard Maas <gerard.m...@gmail.com> wrote:

> Hi,
>
> We're facing a situation where simple queries to parquet files stored in
> Swift through a Hive Metastore sometimes fail with this exception:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 6
> in stage 58.0 failed 4 times, most recent failure: Lost task 6.3 in stage
> 58.0 (TID 412, agent-1.mesos.private):
> org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException: Missing
> mandatory configuration option: fs.swift.service.######.auth.url
> at 
> org.apache.hadoop.fs.swift.http.RestClientBindings.copy(RestClientBindings.java:219)
> (...)
>
> Queries requiring a full table scan, like select(count(*)) would fail with
> the mentioned exception while smaller chunks of work like " select *
>  from... LIMIT 5" would succeed.
>

...

An update:

When using the Zeppelin Notebook on a Mesos cluster, as a _workaround_ I
can get the Notebook running
reliably when using this setting and starting with this paragraph:

* spark.mesos.coarse = true

|| import util.Random.nextInt
|| sc.parallelize((0 to 1000).toList,
20).toDF.write.parquet(s"swift://###/test/${util.Random.nextInt}"

This parquet write will touch all the executors (4 worker nodes in this
experiment).

So, it looks like _writing_ once, at the start of the Notebook will
distribute the swift authentication
data to the executors and after that, alle queries just work (including the
count(*) queries that failed
before).

This is using a Zeppelin notebook with Spark 1.5.1 with Hadoop 2.4.

HTH,

Peter

Reply via email to