shuffle write is very slow

KhajaAsmath Mohammed Thu, 25 May 2017 14:57:36 -0700

Hi,

I am converting hive job with spark job. I have tested on small set and
logic is correct in hive and spark.


when i started testing on large data, spark is very slow when compared to
hive.

shuffle write is taking long time. any suggestions?

I am creating temporary table in spark and overwriting hive table with
partitions from that temporary table created on spark.

 dataframe_transposed.registerTempTable(srcTable)
    import sqlContext._
    import sqlContext.implicits._
    val query=s"INSERT OVERWRITE TABLE ${destTable} SELECT * from
${srcTable}"
    println(s"INSERT OVERWRITE TABLE ${destTable} SELECT * from
${srcTable}")
    logger.info(s"Executing Query ${query}")
    sqlContext.sql(query)

total size of dataframe is around 190 GB and it is running for ever in this
case while hive job can be completed in 4 hours.

Thanks,
Asmath.

shuffle write is very slow

Reply via email to