Prashant Sharma created SPARK-21177: ---------------------------------------
Summary: Append to hive slows down linearly, with number of appends. Key: SPARK-21177 URL: https://issues.apache.org/jira/browse/SPARK-21177 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Prashant Sharma In short, please use the following shell transcript for the reproducer. {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. scala> def printTimeTaken(str: String, f: () => Unit) { val start = System.nanoTime() f() val end = System.nanoTime() val timetaken = end - start import scala.concurrent.duration._ println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n") } | | | | | | | printTimeTaken: (str: String, f: () => Unit)Unit scala> for(i <- 1 to 10000) {printTimeTaken("time to append to hive:", () => { Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })} Time taken for time to append to hive: is 284 Time taken for time to append to hive: is 211 ... ... Time taken for time to append to hive: is 2615 Time taken for time to append to hive: is 3055 Time taken for time to append to hive: is 22425 .... {code} Why does it matter ? In a streaming job it is not possible to append to hive using this dataframe operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org