[ 
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704815#comment-14704815
 ] 

Herman van Hovell commented on SPARK-9357:
------------------------------------------

[~chenghao] This summarizes the pros and cons to {{JoinedRow}} nicely.

If I understand your idea correctly: In order to make a n-ary {{JoinedRow}} 
work, we would need to share the same instance across multiple operators, or at 
least have an operator share which type of Row it is returning. The latter 
would also be very interesting from a CG/JIT point of view: if we can nail the 
class used down to its specific implementation during CG, JIT could have a 
monomorphic call site to work with.

A naive design would probably do a ton of branching. We could get around this 
by using a {{JoinedRow}} aware {{BoundReference}} (a bit more general than what 
is in the current PR).



> Remove JoinedRow
> ----------------
>
>                 Key: SPARK-9357
>                 URL: https://issues.apache.org/jira/browse/SPARK-9357
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>            Reporter: Reynold Xin
>
> JoinedRow was introduced to join two rows together, in aggregation (join key 
> and value), joins (left, right), window functions, etc.
> It aims to reduce the amount of data copied, but incurs branches when the row 
> is actually read. Given all the fields will be read almost all the time 
> (otherwise they get pruned out by the optimizer), branch predictor cannot do 
> anything about those branches.
> I think a better way is just to remove this thing, and materializes the row 
> data directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to