Re: spark 1.6.0 read s3 files error.

2016-08-02 Thread freedafeng
Solution:
sc._jsc.hadoopConfiguration().set("fs.s3a.awsAccessKeyId", "...") 
sc._jsc.hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", "...") 

Got this solution from a cloudera lady. Thanks Neerja.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27452.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 1.6.0 read s3 files error.

2016-08-02 Thread Andy Davidson
Hi Freedafeng

I have been reading and writing to s3 using spark-1.6.x with out any
problems. Can you post a little code example and any error messages?

Andy

From:  freedafeng <freedaf...@yahoo.com>
Date:  Tuesday, August 2, 2016 at 9:26 AM
To:  "user @spark" <user@spark.apache.org>
Subject:  Re: spark 1.6.0 read s3 files error.

> Any one, please? I believe many of us are using spark 1.6 or higher with
> s3... 
> 
> 
> 
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-
> error-tp27417p27451.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 




Re: spark 1.6.0 read s3 files error.

2016-08-02 Thread freedafeng
Any one, please? I believe many of us are using spark 1.6 or higher with
s3... 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27451.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread freedafeng
tried the following. still failed the same way.. it ran in yarn. cdh5.8.0

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName('s3 ---')
sc = SparkContext(conf=conf)

sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "...")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "...")

myRdd =
sc.textFile("s3n:///y=2016/m=5/d=26/h=20/2016.5.26.21.9.52.6d53180a-28b9-4e65-b749-b4a2694b9199.json.gz")

count = myRdd.count()
print "The count is", count



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27427.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread freedafeng
BTW, I also tried yarn. Same error. 

When I ran the script, I used the real credentials for s3, which is omitted
in this post. sorry about that.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27425.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread Andy Davidson
Hi Freedafeng

Can you tells a little more? I.E. Can you paste your code and error message?

Andy

From:  freedafeng <freedaf...@yahoo.com>
Date:  Thursday, July 28, 2016 at 9:21 AM
To:  "user @spark" <user@spark.apache.org>
Subject:  Re: spark 1.6.0 read s3 files error.

> The question is, what is the cause of the problem? and how to fix it? Thanks.
> 
> 
> 
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-
> error-tp27417p27424.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 




Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread freedafeng
The question is, what is the cause of the problem? and how to fix it? Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27424.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 1.6.0 read s3 files error.

2016-07-27 Thread Andy Davidson
Hi Freedafeng

The following works for me df will be a data frame. fullPath is lists list
of various part files stored in s3.

fullPath = 
['s3n:///json/StreamingKafkaCollector/s1/2016-07-10/146817304/part-r
-0-a2121800-fa5b-44b1-a994-67795' ]

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.format('json').load(fullPath).select(³key1") #.cache()


My file is raw JSON. I you might have to tweak the above statement to work
with gz files

Andy


From:  freedafeng 
Date:  Wednesday, July 27, 2016 at 10:36 AM
To:  "user @spark" 
Subject:  spark 1.6.0 read s3 files error.

> cdh 5.7.1. pyspark.
> 
> codes: ===
> from pyspark import SparkContext, SparkConf
> 
> conf = SparkConf().setAppName('s3 ---')
> sc = SparkContext(conf=conf)
> 
> myRdd =
> sc.textFile("s3n:///y=2016/m=5/d=26/h=20/2016.5.26.21.9.52.6d53180a-28b9-4
> e65-b749-b4a2694b9199.json.gz")
> 
> count = myRdd.count()
> print "The count is", count
> 
> ===
> standalone mode: command line:
> 
> AWS_ACCESS_KEY_ID=??? AWS_SECRET_ACCESS_KEY=??? ./bin/spark-submit
> --driver-memory 4G  --master  spark://master:7077 --conf
> "spark.default.parallelism=70"  /root/workspace/test/s3.py
> 
> Error:
> )
> 16/07/27 17:27:26 INFO spark.SparkContext: Created broadcast 0 from textFile
> at NativeMethodAccessorImpl.java:-2
> Traceback (most recent call last):
>   File "/root/workspace/test/s3.py", line 12, in 
> count = myRdd.count()
>   File
> "/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/pyspark
> .zip/pyspark/rdd.py",
> line 1004, in count
>   File
> "/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/pyspark
> .zip/pyspark/rdd.py",
> line 995, in sum
>   File
> "/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/pyspark
> .zip/pyspark/rdd.py",
> line 869, in fold
>   File
> "/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/pyspark
> .zip/pyspark/rdd.py",
> line 771, in collect
>   File
> "/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/py4j-0.
> 9-src.zip/py4j/java_gateway.py",
> line 813, in __call__
>   File
> "/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark/python/lib/py4j-0.
> 9-src.zip/py4j/protocol.py",
> line 308, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : java.lang.VerifyError: Bad type on operand stack
> Exception Details:
>   Location:
>
> org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.copy(Ljava/lang/Stri
> ng;Ljava/lang/String;)V
> @155: invokevirtual
>   Reason:
> Type 'org/jets3t/service/model/S3Object' (current frame, stack[4]) is
> not assignable to 'org/jets3t/service/model/StorageObject'
>   Current Frame:
> bci: @155
> flags: { }
> locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore',
> 'java/lang/String', 'java/lang/String', 'org/jets3t/service/model/S3Object'
> }
> stack: { 'org/jets3t/service/S3Service', 'java/lang/String',
> 'java/lang/String', 'java/lang/String', 'org/jets3t/service/model/S3Object',
> integer }
>   Bytecode:
> 000: b200 fcb9 0190 0100 9900 39b2 00fc bb01
> 010: 5659 b701 5713 0192 b601 5b2b b601 5b13
> 020: 0194 b601 5b2c b601 5b13 0196 b601 5b2a
> 030: b400 7db6 00e7 b601 5bb6 015e b901 9802
> 040: 002a b400 5799 0030 2ab4 0047 2ab4 007d
> 050: 2b01 0101 01b6 019b 4e2a b400 6b09 949e
> 060: 0016 2db6 019c 2ab4 006b 949e 000a 2a2d
> 070: 2cb6 01a0 b1bb 00a0 592c b700 a14e 2d2a
> 080: b400 73b6 00b0 2ab4 0047 2ab4 007d b600
> 090: e72b 2ab4 007d b600 e72d 03b6 01a4 57a7
> 0a0: 000a 4e2a 2d2b b700 c7b1
>   Exception Handler Table:
> bci [0, 116] => handler: 162
> bci [117, 159] => handler: 162
>   Stackmap Table:
> same_frame_extended(@65)
> same_frame(@117)
> same_locals_1_stack_item_frame(@162,Object[#139])
> same_frame(@169)
> 
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3Fi
> leSystem.java:338)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem
> .java:328)
> 
> .
> 
> TIA
> 
> 
> 
> 
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-
> error-tp27417.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
>