RE: Boto3 library send to pyspark

2019-04-17 Thread Gorka Bravo Martinez
Hi Gourav, you mean by seting a different python environment while running pyspark? Cheers, Gorka. From: Gourav Sengupta [gourav.sengu...@gmail.com] Sent: 17 April 2019 10:06 To: Gorka Bravo Martinez Cc: user@spark.apache.org Subject: Re: Boto3 library

Boto3 library send to pyspark

2019-04-17 Thread Gorka Bravo Martinez
Hi all, I would like to send a boto/boto3 library while running pyspark with yarn client mode, how is it possible? I am aware sc.addFile() can add a .py file, is it the same for a library? Cheers, Gorka.

Reading RDD by (key, data) from s3

2019-04-16 Thread Gorka Bravo Martinez
Hi, I am trying to read gzipped json data from s3, my idea would be to do => data = (s3_keys .mapValues(lambda x: x, s3_read_data(x) ) for that I though about using sc.textFile instead of s3_read_data, but wouldn't work. Any idea how to achieve a solution in here?