I'm writing a large dataset in Parquet format to HDFS using Spark and it runs 
rather slowly in EMR vs say Databricks. I realize that if I was able to use 
Hadoop 3.1, it would be much more performant because it has a high performance 
output committer. Is this the case, and if so - when will there be a version of 
EMR that uses Hadoop 3.1 ? The current version I'm using is 5.21.
Sent from my iPhone
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to