Jörn,
I agree with you, but the vendor is a little difficult to work with. For now, I
will try to decompress it from S3 and save it plainly into HDFS. If someone
already has this example, please let me know.
Cheers,
Ben
> On Feb 13, 2017, at 9:50 AM, Jörn Franke wrote:
>
> Your vendor should use the parquet internal compression and not take a
> parquet file and gzip it.
>
>> On 13 Feb 2017, at 18:48, Benjamin Kim wrote:
>>
>> We are receiving files from an outside vendor who creates a Parquet data
>> file and Gzips it before delivery. Does anyone know how to Gunzip the file
>> in Spark and inject the Parquet data into a DataFrame? I thought using
>> sc.textFile or sc.wholeTextFiles would automatically Gunzip the file, but
>> I’m getting a decompression header error when trying to open the Parquet
>> file.
>>
>> Thanks,
>> Ben
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org