I'm recently reading the source code of the SparkSQL project, and found some interesting databricks blogs about the tungsten project. I've roughly read through the encoder and unsafe representation part of the tungsten project(haven't read the algorithm part such as cache friendly hashmap algorithms). Now there's a big puzzle in front of me about the codegen of SparkSQL and how does the codegen utilize the tungsten encoding between JMV objects and unsafe bits. So can anyone tell me that's the main difference in situations where I write a UDT like ExamplePointUDT in SparkSQL or just create an ArrayType which can be handled by the tungsten encoder? I'll really appreciate it if you can go through some concrete code examples. thanks a lot!
-- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-mainly-different-from-a-UDT-and-a-spark-internal-type-that-ExpressionEncoder-recognized-tp20370.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org