subject:"Get S3 Parquet File"

Re: Get S3 Parquet File

2017-02-27 Thread Femi Anthony

Ok, thanks a lot for the heads up. Sent from my iPhone > On Feb 25, 2017, at 10:58 AM, Steve Loughran wrote: > > >> On 24 Feb 2017, at 07:47, Femi Anthony wrote: >> >> Have you tried reading using s3n which is a slightly older protocol ? I'm >>

Re: Get S3 Parquet File

2017-02-25 Thread Steve Loughran

On 24 Feb 2017, at 07:47, Femi Anthony > wrote: Have you tried reading using s3n which is a slightly older protocol ? I'm not sure how compatible s3a is with older versions of Spark. I would absolutely not use s3n with a 1.2 GB file. There is a

Re: Get S3 Parquet File

2017-02-24 Thread Benjamin Kim

Gourav, I’ll start experimenting with Spark 2.1 to see if this works. Cheers, Ben > On Feb 24, 2017, at 5:46 AM, Gourav Sengupta > wrote: > > Hi Benjamin, > > First of all fetching data from S3 while writing a code in on premise system > is a very bad idea. You

Re: Get S3 Parquet File

2017-02-24 Thread Gourav Sengupta

Hi Benjamin, First of all fetching data from S3 while writing a code in on premise system is a very bad idea. You might want to first copy the data in to local HDFS before running your code. Ofcourse this depends on the volume of data and internet speed that you have. The platform which makes

Re: Get S3 Parquet File

2017-02-23 Thread Femi Anthony

Have you tried reading using s3n which is a slightly older protocol ? I'm not sure how compatible s3a is with older versions of Spark. Femi On Fri, Feb 24, 2017 at 2:18 AM, Benjamin Kim wrote: > Hi Gourav, > > My answers are below. > > Cheers, > Ben > > > On Feb 23, 2017,

Re: Get S3 Parquet File

2017-02-23 Thread Benjamin Kim

Hi Gourav, My answers are below. Cheers, Ben > On Feb 23, 2017, at 10:57 PM, Gourav Sengupta > wrote: > > Can I ask where are you running your CDH? Is it on premise or have you > created a cluster for yourself in AWS? Our cluster in on premise in our data >

Re: Get S3 Parquet File

2017-02-23 Thread Gourav Sengupta

Can I ask where are you running your CDH? Is it on premise or have you created a cluster for yourself in AWS? Also I have really never seen use s3a before, that was used way long before when writing s3 files took a long time, but I think that you are reading it. Anyideas why you are not

Re: Get S3 Parquet File

2017-02-23 Thread Benjamin Kim

Aakash, Here is a code snippet for the keys. val accessKey = “---" val secretKey = “---" val hadoopConf = sc.hadoopConfiguration hadoopConf.set("fs.s3a.access.key", accessKey) hadoopConf.set("fs.s3a.secret.key", secretKey) hadoopConf.set("spark.hadoop.fs.s3a.access.key",accessKey)

Re: Get S3 Parquet File

2017-02-23 Thread Aakash Basu

Hey, Please recheck your access key and secret key being used to fetch the parquet file. It seems to be a credential error. Either mismatch/load. If load, then first use it directly in code and see if the issue resolves, then it can be hidden and read from Input Params. Thanks, Aakash. On

Get S3 Parquet File

2017-02-23 Thread Benjamin Kim

We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet file from AWS S3. We can read the schema and show some data when the file is loaded into a DataFrame, but when we try to do some operations, such as count, we get this error below.

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Re: Get S3 Parquet File

Get S3 Parquet File

10 matches

Site Navigation

Mail list logo

Footer information