Hi, Can the Spark SQL Optimizations be disabled somehow?
In our project we started 4 weeks ago to write scala / spark / dataframe code. We currently have only around 10% of the planned project scope, and we are already waiting 10 (Spark 2.1.0, everything cached) to 30 (Spark 1.6, nothing cached) minutes for a single unit test run to finish. We have for example one scala file with maybe 80 lines of code (several joins, several subtrees reused in different places) that takes up to 6 minutes to be optimized (the catalyst output is also > 100 Mb). The input for our unit tests is usually 2 - 3 rows. That is the motivation to disable the optimizer in unit tests. I have found this unanswered SO post <http://stackoverflow.com/questions/33984152/how-to-speed-up-spark-sql-unit-tests> , but not much more on that topic. I have also found this SimpleTestOptimizer <https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L151> which sounds perfect, but I have no idea how to instantiate a Spark Session so it uses that one. Does nobody else have this problem? Is there something fundamentally wrong with our approach? Regards, Stefan Ackermann -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Disable-Spark-SQL-Optimizations-for-unit-tests-tp28380.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org