Thanks for your reply!

Here's my *understanding*:
basic types that ScalaReflection understands are encoded into tungsten
binary format, while UDTs are encoded into GenericInternalRow, which stores
the JVM objects in an Array[Any] under the hood, and thus lose those memory
footprint efficiency and cpu cache efficiency stuff provided by tungsten
encoding.

If the above is correct, then here are my *further questions*:
Are SparkPlan nodes (those ends with Exec) all codegenerated before actually
running the toRdd logic? I know there are some non-codegenable nodes which
implement trait CodegenFallback, but there's also a doGenCode method in the
trait, so the actual calling convention really puzzles me. And I've tried to
trace those calling flow for a few days but found them scattered every
where. I cannot make a big graph of the method calling order even with the
help of IntelliJ.

Let me rephrase this. How does the SparkSQL engine call the codegen APIs to
do the job of producing RDDs? What are those eval methods in Expressions for
given there's already a doGenCode next to it?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-mainly-different-from-a-UDT-and-a-spark-internal-type-that-ExpressionEncoder-recognized-tp20370p20376.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to