Thanks Steve, But this error occurs only with parquet files, CSVs works.
Best, *Daniel Lopes* Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> On Sun, Sep 11, 2016 at 3:28 PM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 9 Sep 2016, at 17:56, Daniel Lopes <dan...@onematch.com.br> wrote: > > Hi, someone can help > > I'm trying to use parquet in IBM Block Storage at Spark but when I try to > load get this error: > > using this config > > credentials = { > "name": "keystone", > *"auth_url": "https://identity.open.softlayer.com > <https://identity.open.softlayer.com/>",* > "project": "object_storage_23f274c1_d11XXXXXXXXXXXXXXXe634", > "projectId": "XXXXXXd9c4aa39b7c7eCCCCCCCCb", > "region": "dallas", > "userId": "XXXXX64087180b40XXXXX2b909", > "username": "admin_XXXX9dd810f8901d48778XXXXXX", > "password": "chXXXXXXXXXXXXX6_", > "domainId": "c1ddad17cfcXXXXXXXXX41", > "domainName": "10XXXXXX", > "role": "admin" > } > > def set_hadoop_config(credentials): > """This function sets the Hadoop configuration with given credentials, > so it is possible to access data using SparkContext""" > > prefix = "fs.swift.service." + credentials['name'] > hconf = sc._jsc.hadoopConfiguration() > *hconf.set(prefix + ".auth.url", > credentials['auth_url']+'/v3/auth/tokens')* > hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") > hconf.set(prefix + ".tenant", credentials['projectId']) > hconf.set(prefix + ".username", credentials['userId']) > hconf.set(prefix + ".password", credentials['password']) > hconf.setInt(prefix + ".http.port", 8080) > hconf.set(prefix + ".region", credentials['region']) > hconf.setBoolean(prefix + ".public", True) > > set_hadoop_config(credentials) > > ------------------------------------------------- > > Py4JJavaErrorTraceback (most recent call last) > <ipython-input-55-5a14928215eb> in <module>() > ----> 1 train.groupby('Acordo').count().show() > > *Py4JJavaError: An error occurred while calling* o406.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task > 60 in stage 30.0 failed 10 times, most recent failure: Lost task 60.9 in > stage 30.0 (TID 2556, yp-spark-dal09-env5-0039): org.apache.hadoop.fs.swift. > exceptions.SwiftConfigurationException:* Missing mandatory configuration > option: fs.swift.service.keystone.auth.url* > > > > In my own code, I'd assume that the value of credentials['name'] didn't > match that of the URL, assuming you have something like > swift://bucket.keystone . Failing that: the options were set too late. > > Instead of asking for the hadoop config and editing that, set the option > in your spark context, before it is launched, with the prefix "hadoop" > > > at org.apache.hadoop.fs.swift.http.RestClientBindings.copy( > RestClientBindings.java:223) > at org.apache.hadoop.fs.swift.http.RestClientBindings.bind( > RestClientBindings.java:147) > > > *Daniel Lopes* > Chief Data and Analytics Officer | OneMatch > c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes > > www.onematch.com.br > <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> > > >