Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark cluster will not work. It just does not have the resources to do it. I'm trying to do it using the commandline and not use Spark. I will use this command in shell script. I tried "hdfs dfs -getmerge", but the file becomes unreadable by Spark with gzip footer error.
Thanks, Ben --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org