Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix

Daniel Lopes Tue, 13 Sep 2016 12:50:12 -0700

Hi Mario,

Thanks for your help, so I will keeping using CSVs


Best,

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Mon, Sep 12, 2016 at 3:39 PM, Mario Ds Briggs <mario.bri...@in.ibm.com>
wrote:

> Daniel,
>
> I believe it is related to https://issues.apache.org/
> jira/browse/SPARK-13979 and happens only when task fails in a executor
> (probably for some other reason u hit the latter in parquet and not csv).
>
> The PR in there, should be shortly available in IBM's Analytics for Spark.
>
>
> thanks
> Mario
>
> [image: Inactive hide details for Adam Roberts---12/09/2016 09:37:21
> pm---Mario, incase you've not seen this...]Adam Roberts---12/09/2016
> 09:37:21 pm---Mario, incase you've not seen this...
>
> From: Adam Roberts/UK/IBM
> To: Mario Ds Briggs/India/IBM@IBMIN
> Date: 12/09/2016 09:37 pm
> Subject: Fw: Spark + Parquet + IBM Block Storage at Bluemix
> ------------------------------
>
>
> Mario, incase you've not seen this...
>
> ------------------------------
> *Adam Roberts*
> IBM Spark Team Lead
> Runtime Technologies - Hursley
> ----- Forwarded by Adam Roberts/UK/IBM on 12/09/2016 17:06 -----
>
> From: Daniel Lopes <dan...@onematch.com.br>
> To: Steve Loughran <ste...@hortonworks.com>
> Cc: user <user@spark.apache.org>
> Date: 12/09/2016 13:05
> Subject: Re: Spark + Parquet + IBM Block Storage at Bluemix
> ------------------------------
>
>
>
> Thanks Steve,
>
> But this error occurs only with parquet files, CSVs works.
>
> Best,
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | *https://www.linkedin.com/in/dslopes*
> <https://www.linkedin.com/in/dslopes>
>
> *www.onematch.com.br*
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>
> On Sun, Sep 11, 2016 at 3:28 PM, Steve Loughran <*ste...@hortonworks.com*
> <ste...@hortonworks.com>> wrote:
>
>    On 9 Sep 2016, at 17:56, Daniel Lopes <*dan...@onematch.com.br*
>          <dan...@onematch.com.br>> wrote:
>
>          Hi, someone can help
>
>          I'm trying to use parquet in IBM Block Storage at Spark but when
>          I try to load get this error:
>
>          using this config
>
>          credentials = {
>            "name": "keystone",
>            *"auth_url": "**https://identity.open.softlayer.com*
>          <https://identity.open.softlayer.com/>*",*
>            "project": "object_storage_23f274c1_d11XXXXXXXXXXXXXXXe634",
>            "projectId": "XXXXXXd9c4aa39b7c7eCCCCCCCCb",
>            "region": "dallas",
>            "userId": "XXXXX64087180b40XXXXX2b909",
>            "username": "admin_XXXX9dd810f8901d48778XXXXXX",
>            "password": "chXXXXXXXXXXXXX6_",
>            "domainId": "c1ddad17cfcXXXXXXXXX41",
>            "domainName": "10XXXXXX",
>            "role": "admin"
>          }
>
>          def set_hadoop_config(credentials):
>              """This function sets the Hadoop configuration with given
>          credentials,
>              so it is possible to access data using SparkContext"""
>
>              prefix = "fs.swift.service." + credentials['name']
>              hconf = sc._jsc.hadoopConfiguration()
>              *hconf.set(prefix + ".auth.url",
>          credentials['auth_url']+'/v3/auth/tokens')*
>              hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
>              hconf.set(prefix + ".tenant", credentials['projectId'])
>              hconf.set(prefix + ".username", credentials['userId'])
>              hconf.set(prefix + ".password", credentials['password'])
>              hconf.setInt(prefix + ".http.port", 8080)
>              hconf.set(prefix + ".region", credentials['region'])
>              hconf.setBoolean(prefix + ".public", True)
>
>          set_hadoop_config(credentials)
>
>          -------------------------------------------------
>
>          Py4JJavaErrorTraceback (most recent call last)
>          <ipython-input-55-5a14928215eb> in <module>()
>          ----> 1 train.groupby('Acordo').count().show()
>
>          *Py4JJavaError: An error occurred while calling* o406.showString.
>          : org.apache.spark.SparkException: Job aborted due to stage
>          failure: Task 60 in stage 30.0 failed 10 times, most recent failure: 
> Lost
>          task 60.9 in stage 30.0 (TID 2556, yp-spark-dal09-env5-0039):
>          org.apache.hadoop.fs.swift.exceptions.
>          SwiftConfigurationException:* Missing mandatory configuration
>          option: fs.swift.service.keystone.auth.url*
>
>
>    In my own code, I'd assume that the value of credentials['name']
>    didn't match that of the URL, assuming you have something like
>    swift://bucket.keystone . Failing that: the options were set too late.
>
>    Instead of asking for the hadoop config and editing that, set the
>    option in your spark context, before it is launched, with the prefix
>    "hadoop"
>
>    at org.apache.hadoop.fs.swift.http.RestClientBindings.copy(
>          RestClientBindings.java:223)
>          at org.apache.hadoop.fs.swift.http.RestClientBindings.bind(
>          RestClientBindings.java:147)
>
>
>          *Daniel Lopes*
>          Chief Data and Analytics Officer | OneMatch
>          c: *+55 (18) 99764-2733* <%2B55%20%2818%29%2099764-2733> |
>          *https://www.linkedin.com/in/dslopes*
>          <https://www.linkedin.com/in/dslopes>
>
>          *www.onematch.com.br*
>          
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>
>
>
>
>

Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix

Reply via email to