Hi,

Can the Spark SQL Optimizations be disabled somehow?

In our project we started 4 weeks ago to write scala / spark / dataframe
code. We currently have only around 10% of the planned project scope, and we
are already waiting 10 (Spark 2.1.0, everything cached) to 30 (Spark 1.6,
nothing cached) minutes for a single unit test run to finish. We have for
example one scala file with maybe 80 lines of code (several joins, several
subtrees reused in different places) that takes up to 6 minutes to be
optimized (the catalyst output is also > 100 Mb). The input for our unit
tests is usually 2 - 3 rows. That is the motivation to disable the optimizer
in unit tests.

I have found this  unanswered SO post
<http://stackoverflow.com/questions/33984152/how-to-speed-up-spark-sql-unit-tests>
 
, but not much more on that topic. I have also found this 
SimpleTestOptimizer
<https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L151>
  
which sounds perfect, but I have no idea how to instantiate a Spark Session
so it uses that one. 

Does nobody else have this problem? Is there something fundamentally wrong
with our approach?

Regards,
Stefan Ackermann



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Disable-Spark-SQL-Optimizations-for-unit-tests-tp28380.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to