Hi, It appears that there's already a discussion about why GenerateExec operator has the flag off.
1. https://issues.apache.org/jira/browse/SPARK-21657 Spark has exponential time complexity to explode(array of structs) which is in progress 2. And more importantly @rxin has turned that off because --> "Disable generate codegen since it fails my workload." - Wished he included the workload to showcase the issue :( Looks like there are a bunch of wise people already on it so I'll just listen... Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Dec 11, 2017 at 10:15 PM, Jacek Laskowski <[email protected]> wrote: > Hi, > > After another day trying to get my head around WholeStageCodegenExec > and InputAdapter and CollapseCodegenStages optimization rule I came to > conclusion that it may have something to do with UnsafeRow vs > GenericInternalRow/InternalRow so when a physical operator wants to > _somehow_ participate in whole-stage codegen it can extend CodegenSupport > trait and enable accessing GenericInternalRow by turning supportCodegen > flag off. > > I can understand how badly that can read, but without help from Spark SQL > devs that's all I can figure out myself. Any help appreciated. > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > On Sun, Dec 10, 2017 at 10:34 PM, Stephen Boesch <[email protected]> > wrote: > >> A relevant observation: there was a closed/executed jira last year to >> remove the option to disable the codegen flag (and unsafe flag as well): >> https://issues.apache.org/jira/browse/SPARK-11644 >> >> 2017-12-10 13:16 GMT-08:00 Jacek Laskowski <[email protected]>: >> >>> Hi, >>> >>> I'm wondering why a physical operator like GenerateExec would >>> extend CodegenSupport [1], but had the supportCodegen flag turned off? >>> >>> What's the meaning of such a combination -- be a CodegenSupport with >>> supportCodegen off? >>> >>> [1] https://github.com/apache/spark/blob/master/sql/core/src >>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64 >>> >>> [2] https://github.com/apache/spark/blob/master/sql/core/src >>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125 >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://about.me/JacekLaskowski >>> Spark Structured Streaming https://bit.ly/spark-structured-streaming >>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >> >> >
