Hi Mario, Thanks for your help, so I will keeping using CSVs
Best, *Daniel Lopes* Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> On Mon, Sep 12, 2016 at 3:39 PM, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote: > Daniel, > > I believe it is related to https://issues.apache.org/ > jira/browse/SPARK-13979 and happens only when task fails in a executor > (probably for some other reason u hit the latter in parquet and not csv). > > The PR in there, should be shortly available in IBM's Analytics for Spark. > > > thanks > Mario > > [image: Inactive hide details for Adam Roberts---12/09/2016 09:37:21 > pm---Mario, incase you've not seen this...]Adam Roberts---12/09/2016 > 09:37:21 pm---Mario, incase you've not seen this... > > From: Adam Roberts/UK/IBM > To: Mario Ds Briggs/India/IBM@IBMIN > Date: 12/09/2016 09:37 pm > Subject: Fw: Spark + Parquet + IBM Block Storage at Bluemix > ------------------------------ > > > Mario, incase you've not seen this... > > ------------------------------ > *Adam Roberts* > IBM Spark Team Lead > Runtime Technologies - Hursley > ----- Forwarded by Adam Roberts/UK/IBM on 12/09/2016 17:06 ----- > > From: Daniel Lopes <dan...@onematch.com.br> > To: Steve Loughran <ste...@hortonworks.com> > Cc: user <user@spark.apache.org> > Date: 12/09/2016 13:05 > Subject: Re: Spark + Parquet + IBM Block Storage at Bluemix > ------------------------------ > > > > Thanks Steve, > > But this error occurs only with parquet files, CSVs works. > > Best, > > *Daniel Lopes* > Chief Data and Analytics Officer | OneMatch > c: +55 (18) 99764-2733 | *https://www.linkedin.com/in/dslopes* > <https://www.linkedin.com/in/dslopes> > > *www.onematch.com.br* > <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> > > On Sun, Sep 11, 2016 at 3:28 PM, Steve Loughran <*ste...@hortonworks.com* > <ste...@hortonworks.com>> wrote: > > On 9 Sep 2016, at 17:56, Daniel Lopes <*dan...@onematch.com.br* > <dan...@onematch.com.br>> wrote: > > Hi, someone can help > > I'm trying to use parquet in IBM Block Storage at Spark but when > I try to load get this error: > > using this config > > credentials = { > "name": "keystone", > *"auth_url": "**https://identity.open.softlayer.com* > <https://identity.open.softlayer.com/>*",* > "project": "object_storage_23f274c1_d11XXXXXXXXXXXXXXXe634", > "projectId": "XXXXXXd9c4aa39b7c7eCCCCCCCCb", > "region": "dallas", > "userId": "XXXXX64087180b40XXXXX2b909", > "username": "admin_XXXX9dd810f8901d48778XXXXXX", > "password": "chXXXXXXXXXXXXX6_", > "domainId": "c1ddad17cfcXXXXXXXXX41", > "domainName": "10XXXXXX", > "role": "admin" > } > > def set_hadoop_config(credentials): > """This function sets the Hadoop configuration with given > credentials, > so it is possible to access data using SparkContext""" > > prefix = "fs.swift.service." + credentials['name'] > hconf = sc._jsc.hadoopConfiguration() > *hconf.set(prefix + ".auth.url", > credentials['auth_url']+'/v3/auth/tokens')* > hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") > hconf.set(prefix + ".tenant", credentials['projectId']) > hconf.set(prefix + ".username", credentials['userId']) > hconf.set(prefix + ".password", credentials['password']) > hconf.setInt(prefix + ".http.port", 8080) > hconf.set(prefix + ".region", credentials['region']) > hconf.setBoolean(prefix + ".public", True) > > set_hadoop_config(credentials) > > ------------------------------------------------- > > Py4JJavaErrorTraceback (most recent call last) > <ipython-input-55-5a14928215eb> in <module>() > ----> 1 train.groupby('Acordo').count().show() > > *Py4JJavaError: An error occurred while calling* o406.showString. > : org.apache.spark.SparkException: Job aborted due to stage > failure: Task 60 in stage 30.0 failed 10 times, most recent failure: > Lost > task 60.9 in stage 30.0 (TID 2556, yp-spark-dal09-env5-0039): > org.apache.hadoop.fs.swift.exceptions. > SwiftConfigurationException:* Missing mandatory configuration > option: fs.swift.service.keystone.auth.url* > > > In my own code, I'd assume that the value of credentials['name'] > didn't match that of the URL, assuming you have something like > swift://bucket.keystone . Failing that: the options were set too late. > > Instead of asking for the hadoop config and editing that, set the > option in your spark context, before it is launched, with the prefix > "hadoop" > > at org.apache.hadoop.fs.swift.http.RestClientBindings.copy( > RestClientBindings.java:223) > at org.apache.hadoop.fs.swift.http.RestClientBindings.bind( > RestClientBindings.java:147) > > > *Daniel Lopes* > Chief Data and Analytics Officer | OneMatch > c: *+55 (18) 99764-2733* <%2B55%20%2818%29%2099764-2733> | > *https://www.linkedin.com/in/dslopes* > <https://www.linkedin.com/in/dslopes> > > *www.onematch.com.br* > > <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes> > > > > >