subject:"S3A Creating Task Per Byte \(pyspark \/ 1.6.1\)"

Re: S3A Creating Task Per Byte (pyspark / 1.6.1)

2016-05-13 Thread Steve Loughran

On 12 May 2016, at 18:35, Aaron Jackson > wrote: I'm using the spark 1.6.1 (hadoop-2.6) and I'm trying to load a file that's in s3. I've done this previously with spark 1.5 with no issue. Attempting to load and count a single file as follows:

S3A Creating Task Per Byte (pyspark / 1.6.1)

2016-05-12 Thread Aaron Jackson

I'm using the spark 1.6.1 (hadoop-2.6) and I'm trying to load a file that's in s3. I've done this previously with spark 1.5 with no issue. Attempting to load and count a single file as follows: dataFrame = sqlContext.read.text('s3a://bucket/path-to-file.csv') dataFrame.count() But when it