Switching this from user to dev On Sat, Nov 9, 2019 at 9:47 AM Bartosz Konieczny <bartkoniec...@gmail.com> wrote:
> Hi there, > > Few days ago I got an intriguing but hard to answer question: > "Why Spark generates Java code and not Scala code?" > (https://github.com/bartosz25/spark-scala-playground/issues/18) > > Since I'm not sure about the exact answer, I'd like to ask you to confirm > or not my thinking. I was looking for the reasons in the JIRA and the > research paper "Spark SQL: Relational Data Processing in Spark" ( > http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but > found nothing explaining why Java over Scala. The single task I found was > about why Scala and not Java but concerning data types ( > https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing > here. > > My guesses about choosing Java code are: > - Java runtime compiler libs are more mature and prod-ready than the > Scala's - or at least, they were at the implementation time > - Scala compiler tends to be slower than the Java's > https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed > >From the discussions when I was doing some code gen (in MLlib not SQL) I think this is the primary reason why. > > <https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed> > - Scala compiler seems to be more complex, so debugging & maintaining it > would be harder > this was also given as a secondary reason > - it was easier to represent a pure Java OO design than mixed FP/OO in > Scala > no one brought up this point. Maybe it was a consideration and it just wasn’t raised. > ? > > Thank you for your help. > > > -- > Bartosz Konieczny > data engineer > https://www.waitingforcode.com > https://github.com/bartosz25/ > https://twitter.com/waitingforcode > > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau