Re: Parquet Gzipped Files
Jörn, I agree with you, but the vendor is a little difficult to work with. For now, I will try to decompress it from S3 and save it plainly into HDFS. If someone already has this example, please let me know. Cheers, Ben > On Feb 13, 2017, at 9:50 AM, Jörn Frankewrote: > > Your vendor should use the parquet internal compression and not take a > parquet file and gzip it. > >> On 13 Feb 2017, at 18:48, Benjamin Kim wrote: >> >> We are receiving files from an outside vendor who creates a Parquet data >> file and Gzips it before delivery. Does anyone know how to Gunzip the file >> in Spark and inject the Parquet data into a DataFrame? I thought using >> sc.textFile or sc.wholeTextFiles would automatically Gunzip the file, but >> I’m getting a decompression header error when trying to open the Parquet >> file. >> >> Thanks, >> Ben >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Parquet Gzipped Files
Your vendor should use the parquet internal compression and not take a parquet file and gzip it. > On 13 Feb 2017, at 18:48, Benjamin Kimwrote: > > We are receiving files from an outside vendor who creates a Parquet data file > and Gzips it before delivery. Does anyone know how to Gunzip the file in > Spark and inject the Parquet data into a DataFrame? I thought using > sc.textFile or sc.wholeTextFiles would automatically Gunzip the file, but I’m > getting a decompression header error when trying to open the Parquet file. > > Thanks, > Ben > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Parquet Gzipped Files
We are receiving files from an outside vendor who creates a Parquet data file and Gzips it before delivery. Does anyone know how to Gunzip the file in Spark and inject the Parquet data into a DataFrame? I thought using sc.textFile or sc.wholeTextFiles would automatically Gunzip the file, but I’m getting a decompression header error when trying to open the Parquet file. Thanks, Ben - To unsubscribe e-mail: user-unsubscr...@spark.apache.org