[ 
https://issues.apache.org/jira/browse/PIG-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-3849:
----------------------------
    Description: 
e.g Group by followed by join on the same key  
This can be done in one vertex with multiple inputs instead of having an extra 
vertex to do the join. i.e Currently Vertex 1 (load relation1) -> Vertex 2 
(group by) -> Vertex 4 (join) <- Vertex 3 (load relation 2). This could be 
changed to Vertex 1 (load relation1) -> Vertex 2 (group by and join) <- Vertex 
3 (load relation 2)

And idea of this kind of optimization from YSmart that hive already integrate 
it. Now pig has already integrate tez, so it would be natural to integrate 
YSmart into pig on tez.

YSmart paper 
http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
YSmart slide 
http://www.slideshare.net/YinHuai/hive-correlation-optimizer?qid=1bb427af-e349-40c0-b9ad-46f508403879&v=default&b=&from_search=4


  was:
e.g Group by followed by join on the same key  
This can be done in one vertex with multiple inputs instead of having an extra 
vertex to do the join. i.e Currently Vertex 1 (load relation1) -> Vertex 2 
(group by) -> Vertex 4 (join) <- Vertex 3 (load relation 2). This could be 
changed to Vertex 1 (load relation1) -> Vertex 2 (group by and join) <- Vertex 
3 (load relation 2)

And idea of this kind of optimization from YSmart that hive already integrate 
it. Now pig has already integrate tez, so it would be natural to integrate 
YSmart into pig on tez.



> Integrate YSmart into Pig on tez
> --------------------------------
>
>                 Key: PIG-3849
>                 URL: https://issues.apache.org/jira/browse/PIG-3849
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>
> e.g Group by followed by join on the same key  
> This can be done in one vertex with multiple inputs instead of having an 
> extra vertex to do the join. i.e Currently Vertex 1 (load relation1) -> 
> Vertex 2 (group by) -> Vertex 4 (join) <- Vertex 3 (load relation 2). This 
> could be changed to Vertex 1 (load relation1) -> Vertex 2 (group by and join) 
> <- Vertex 3 (load relation 2)
> And idea of this kind of optimization from YSmart that hive already integrate 
> it. Now pig has already integrate tez, so it would be natural to integrate 
> YSmart into pig on tez.
> YSmart paper 
> http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> YSmart slide 
> http://www.slideshare.net/YinHuai/hive-correlation-optimizer?qid=1bb427af-e349-40c0-b9ad-46f508403879&v=default&b=&from_search=4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to