Re: Spark + Parquet + IBM Block Storage at Bluemix

Daniel Lopes Mon, 12 Sep 2016 05:05:11 -0700

Thanks Steve,

But this error occurs only with parquet files, CSVs works.


Best,

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Sun, Sep 11, 2016 at 3:28 PM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 9 Sep 2016, at 17:56, Daniel Lopes <dan...@onematch.com.br> wrote:
>
> Hi, someone can help
>
> I'm trying to use parquet in IBM Block Storage at Spark but when I try to
> load get this error:
>
> using this config
>
> credentials = {
>   "name": "keystone",
>   *"auth_url": "https://identity.open.softlayer.com
> <https://identity.open.softlayer.com/>",*
>   "project": "object_storage_23f274c1_d11XXXXXXXXXXXXXXXe634",
>   "projectId": "XXXXXXd9c4aa39b7c7eCCCCCCCCb",
>   "region": "dallas",
>   "userId": "XXXXX64087180b40XXXXX2b909",
>   "username": "admin_XXXX9dd810f8901d48778XXXXXX",
>   "password": "chXXXXXXXXXXXXX6_",
>   "domainId": "c1ddad17cfcXXXXXXXXX41",
>   "domainName": "10XXXXXX",
>   "role": "admin"
> }
>
> def set_hadoop_config(credentials):
>     """This function sets the Hadoop configuration with given credentials,
>     so it is possible to access data using SparkContext"""
>
>     prefix = "fs.swift.service." + credentials['name']
>     hconf = sc._jsc.hadoopConfiguration()
>     *hconf.set(prefix + ".auth.url",
> credentials['auth_url']+'/v3/auth/tokens')*
>     hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
>     hconf.set(prefix + ".tenant", credentials['projectId'])
>     hconf.set(prefix + ".username", credentials['userId'])
>     hconf.set(prefix + ".password", credentials['password'])
>     hconf.setInt(prefix + ".http.port", 8080)
>     hconf.set(prefix + ".region", credentials['region'])
>     hconf.setBoolean(prefix + ".public", True)
>
> set_hadoop_config(credentials)
>
> -------------------------------------------------
>
> Py4JJavaErrorTraceback (most recent call last)
> <ipython-input-55-5a14928215eb> in <module>()
> ----> 1 train.groupby('Acordo').count().show()
>
> *Py4JJavaError: An error occurred while calling* o406.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 60 in stage 30.0 failed 10 times, most recent failure: Lost task 60.9 in
> stage 30.0 (TID 2556, yp-spark-dal09-env5-0039): org.apache.hadoop.fs.swift.
> exceptions.SwiftConfigurationException:* Missing mandatory configuration
> option: fs.swift.service.keystone.auth.url*
>
>
>
> In my own code, I'd assume that the value of credentials['name'] didn't
> match that of the URL, assuming you have something like
> swift://bucket.keystone . Failing that: the options were set too late.
>
> Instead of asking for the hadoop config and editing that, set the option
> in your spark context, before it is launched, with the prefix "hadoop"
>
>
> at org.apache.hadoop.fs.swift.http.RestClientBindings.copy(
> RestClientBindings.java:223)
> at org.apache.hadoop.fs.swift.http.RestClientBindings.bind(
> RestClientBindings.java:147)
>
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>
> www.onematch.com.br
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>
>
>

Re: Spark + Parquet + IBM Block Storage at Bluemix

Reply via email to