Re: What is the explanation of "ConvertToUnsafe" in "Physical Plan"

2016-06-27 Thread Xinh Huynh
I guess it has to do with the Tungsten explicit memory management that
builds on sun.misc.Unsafe. The "ConvertToUnsafe" class converts
Java-object-based rows into UnsafeRows, which has the Spark internal
memory-efficient format.

Here is the related code in 1.6:

ConvertToUnsafe is defined in:
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/rowFormatConverters.scala

/**
* Converts Java-object-based rows into [[UnsafeRow]]s.
*/
case class ConvertToUnsafe(child: SparkPlan) extends UnaryNode

And, UnsafeRow is defined in:
https://github.com/apache/spark/blob/branch-1.6/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java

/**
* An Unsafe implementation of Row which is backed by raw memory instead of
Java objects.
*
* Each tuple has three parts: [null bit set] [values] [variable length
portion]
*
* The bit set is used for null tracking and is aligned to 8-byte word
boundaries. It stores
* one bit per field.
*
* In the `values` region, we store one 8-byte word per field. For fields
that hold fixed-length
* primitive types, such as long, double, or int, we store the value
directly in the word. For
* fields with non-primitive or variable-length values, we store a relative
offset (w.r.t. the
* base address of the row) that points to the beginning of the
variable-length field, and length
* (they are combined into a long).
*
* Instances of `UnsafeRow` act as pointers to row data stored in this
format.
*/
public final class UnsafeRow extends MutableRow implements Externalizable,
KryoSerializable

Xinh

On Sun, Jun 26, 2016 at 1:11 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Hi,
>
> In Spark's Physical Plan what is the explanation for ConvertToUnsafe?
>
> Example:
>
> scala> sorted.filter($"prod_id" ===13).explain
> == Physical Plan ==
> Filter (prod_id#10L = 13)
> +- Sort [prod_id#10L ASC,cust_id#11L ASC,time_id#12 ASC,channel_id#13L
> ASC,promo_id#14L ASC], true, 0
>+- ConvertToUnsafe
>   +- Exchange rangepartitioning(prod_id#10L ASC,cust_id#11L
> ASC,time_id#12 ASC,channel_id#13L ASC,promo_id#14L ASC,200), None
>  +- HiveTableScan
> [prod_id#10L,cust_id#11L,time_id#12,channel_id#13L,promo_id#14L],
> MetastoreRelation oraclehadoop, sales2, None
>
>
> My inclination is that it is a temporary construct like tempTable created
> as part of Physical Plan?
>
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


What is the explanation of "ConvertToUnsafe" in "Physical Plan"

2016-06-26 Thread Mich Talebzadeh
Hi,

In Spark's Physical Plan what is the explanation for ConvertToUnsafe?

Example:

scala> sorted.filter($"prod_id" ===13).explain
== Physical Plan ==
Filter (prod_id#10L = 13)
+- Sort [prod_id#10L ASC,cust_id#11L ASC,time_id#12 ASC,channel_id#13L
ASC,promo_id#14L ASC], true, 0
   +- ConvertToUnsafe
  +- Exchange rangepartitioning(prod_id#10L ASC,cust_id#11L
ASC,time_id#12 ASC,channel_id#13L ASC,promo_id#14L ASC,200), None
 +- HiveTableScan
[prod_id#10L,cust_id#11L,time_id#12,channel_id#13L,promo_id#14L],
MetastoreRelation oraclehadoop, sales2, None


My inclination is that it is a temporary construct like tempTable created
as part of Physical Plan?


Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.