Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them 
into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark 
cluster will not work. It just does not have the resources to do it. I'm trying 
to do it using the commandline and not use Spark. I will use this command in 
shell script. I tried "hdfs dfs -getmerge", but the file becomes unreadable by 
Spark with gzip footer error.

Thanks,
Ben
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to