Merging Parquet Files

Benjamin Kim Thu, 22 Dec 2016 14:02:18 -0800

Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them 
into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark 
cluster will not work. It just does not have the resources to do it. I'm trying 
to do it using the commandline and not use Spark. I will use this command in 
shell script. I tried "hdfs dfs -getmerge", but the file becomes unreadable by 
Spark with gzip footer error.


Thanks,
Ben
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Merging Parquet Files

Reply via email to