Hi,

Writing a csv to HDFS takes about 1 hour:

df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv)

The generated csv file is only about 150kb. The job uses 3 containers (13
cores, 23g mem).

Other people have similar issues but I don't see a good explanation and
solution.

Any clue is highly appreciated! Thanks.

Reply via email to