writing a small csv to HDFS is super slow

Lian Jiang Fri, 22 Mar 2019 14:44:20 -0700

Hi,

Writing a csv to HDFS takes about 1 hour:


df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv)

The generated csv file is only about 150kb. The job uses 3 containers (13
cores, 23g mem).

Other people have similar issues but I don't see a good explanation and
solution.

Any clue is highly appreciated! Thanks.

writing a small csv to HDFS is super slow

Reply via email to