I guess it has to do with the Tungsten explicit memory management that builds on sun.misc.Unsafe. The "ConvertToUnsafe" class converts Java-object-based rows into UnsafeRows, which has the Spark internal memory-efficient format.
Here is the related code in 1.6: ConvertToUnsafe is defined in: https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/rowFormatConverters.scala /** * Converts Java-object-based rows into [[UnsafeRow]]s. */ case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode And, UnsafeRow is defined in: https://github.com/apache/spark/blob/branch-1.6/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java /** * An Unsafe implementation of Row which is backed by raw memory instead of Java objects. * * Each tuple has three parts: [null bit set] [values] [variable length portion] * * The bit set is used for null tracking and is aligned to 8-byte word boundaries. It stores * one bit per field. * * In the `values` region, we store one 8-byte word per field. For fields that hold fixed-length * primitive types, such as long, double, or int, we store the value directly in the word. For * fields with non-primitive or variable-length values, we store a relative offset (w.r.t. the * base address of the row) that points to the beginning of the variable-length field, and length * (they are combined into a long). * * Instances of `UnsafeRow` act as pointers to row data stored in this format. */ public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable Xinh On Sun, Jun 26, 2016 at 1:11 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Hi, > > In Spark's Physical Plan what is the explanation for ConvertToUnsafe? > > Example: > > scala> sorted.filter($"prod_id" ===13).explain > == Physical Plan == > Filter (prod_id#10L = 13) > +- Sort [prod_id#10L ASC,cust_id#11L ASC,time_id#12 ASC,channel_id#13L > ASC,promo_id#14L ASC], true, 0 > +- ConvertToUnsafe > +- Exchange rangepartitioning(prod_id#10L ASC,cust_id#11L > ASC,time_id#12 ASC,channel_id#13L ASC,promo_id#14L ASC,200), None > +- HiveTableScan > [prod_id#10L,cust_id#11L,time_id#12,channel_id#13L,promo_id#14L], > MetastoreRelation oraclehadoop, sales2, None > > > My inclination is that it is a temporary construct like tempTable created > as part of Physical Plan? > > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >