Thanks for your reply! Here's my *understanding*: basic types that ScalaReflection understands are encoded into tungsten binary format, while UDTs are encoded into GenericInternalRow, which stores the JVM objects in an Array[Any] under the hood, and thus lose those memory footprint efficiency and cpu cache efficiency stuff provided by tungsten encoding.
If the above is correct, then here are my *further questions*: Are SparkPlan nodes (those ends with Exec) all codegenerated before actually running the toRdd logic? I know there are some non-codegenable nodes which implement trait CodegenFallback, but there's also a doGenCode method in the trait, so the actual calling convention really puzzles me. And I've tried to trace those calling flow for a few days but found them scattered every where. I cannot make a big graph of the method calling order even with the help of IntelliJ. Let me rephrase this. How does the SparkSQL engine call the codegen APIs to do the job of producing RDDs? What are those eval methods in Expressions for given there's already a doGenCode next to it? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-mainly-different-from-a-UDT-and-a-spark-internal-type-that-ExpressionEncoder-recognized-tp20370p20376.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org