[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-11-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215900#comment-14215900
 ] 

Lefty Leverenz commented on HIVE-5775:
--

Thanks [~jpullokkaran], I removed the TODOC14 label on the assumption that no 
updates are needed at this time.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0

 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-11-17 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214977#comment-14214977
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

[leftylev] - Moved DS spec from In Progress to Completed.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-11-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213886#comment-14213886
 ] 

Lefty Leverenz commented on HIVE-5775:
--

Doc note:  The design doc should be moved from the In Progress section to the 
Completed section.  Does the design doc also need to be updated?

* [Design Docs -- In Progress | 
https://cwiki.apache.org/confluence/display/Hive/DesignDocs#DesignDocs-InProgress]
* [Cost-based optimization in Hive | 
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive]

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040970#comment-14040970
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

The cost model as described in the doc assumes TEZ as the execution layer.


 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040986#comment-14040986
 ] 

Gopal V commented on HIVE-5775:
---

[~xuefuz]: The CBO model rewrites queries using cardinality statistics.

The tuple count and distinct value count should not affect which physical layer 
it runs on - having the CBO split up/reorder a 3-way map-join into 2 phases (or 
vertices) should generate identical plans in both.

MR would run 2 Map-only phases with their own local tasks and hashtable 
uploads, Tez would run 2 vertices with their own broadcast tasks.

Tez can reduce runtimes further by removing the intermediate IO cost  
co-schedule the second vertex in the same container as the first - but that is 
not assumed as it is not a strong guarantee in a busy cluster.

The Tez runtime model is faster, but the logical cost does not change as the 
number of rows read off disk, written to disk and distinct keys remain the same.

In fact as it exists today, because it applies equally to both Tez  MR, it 
ignores a lot of Tez's opportunistic/runtime optimizations like container-reuse 
- e.g. Each vertex in Tez is a different process.

It is upto the Tez DAG planner to attend to such runtime optimization details.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041053#comment-14041053
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

Following may help in reducing the confusion:

1. In design doc the cost formula is for choosing Join Algorithm. The cost 
formula as described in the doc assumes Tez execution.

2. However current work on CBO doesn’t include Join algorithm selection. 
Instead it rearranges Join based on Join cardinality  NDV. In other words Join 
reordering is not depended on Physical Execution Layer (Tez or MR).

3. When we decide to do Join Algorithm Selection we can fit in cost formula for 
both a) MR b) Tez. This way, based on the physical execution layer we can 
select best Join Algorithm/Order. 

4. The cost formula for Join Algorithm selection is not that different between 
MR  Tez (except for intermediate HDFS writes). So assume that CBO can support 
both execution layers rather easily.

5. CBO framework allows you to plug and play any cost model. There is no hard 
coupling.


 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041056#comment-14041056
 ] 

Xuefu Zhang commented on HIVE-5775:
---

Thanks for the clarification, [~gopalv]. We are in total agreement if what is 
put in the logical layer is the optimization that's applicable to either 
execution engine and if execution engine specific optimization is put in the 
execution layer. Maybe the document can be updated to make this explicit to 
avoid confusion/misunderstanding from others.

{quote}
The cost model as described in the doc assumes TEZ as the execution layer.
{quote}

Not sure if I understand [~jpullokkaran] correctly. If the cost model is based 
on Tez, then we shall only use a model that's common for both Tez and MR when 
rewriting the query, right?

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041085#comment-14041085
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

Cost Model described doesn't apply to current CBO work and for the proposed 
branch.
It will apply only for Join Algorithm selection which is not part of the 
current work.

IMO moving join reordering to physical optimizer is the not the correct 
solution. I would rather leave it in logical, since after doing join reordering 
you may able to do other optimizations like, new predicate push down, 
transitive inferences….

When we get around to do Join Algorithm selection there will be two cost 
formulas one for MR and one for Tez.
I think best solution is to support both cost models and decide which one to 
apply based on physical execution layer.

I will update the doc. 

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041115#comment-14041115
 ] 

Xuefu Zhang commented on HIVE-5775:
---

Cool. Thanks for the clarifications.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039987#comment-14039987
 ] 

Xuefu Zhang commented on HIVE-5775:
---

Thanks to all for working on this. I'm not sure if this has ever surfaced, but 
I'm wondering if this cost based optimization is specific to Tez. From the 
design doc it seems that this new optimizer was plugged in the logical layer, 
while certain cost estimations are based on Tez such as vertex. Obviously the 
cost for a given query would be different for MapReduce vs Tez, but cost based 
optimization is equally valuable to both MR and Tez. However, applying an 
optimization based on one execution engine may cause adverse result when the 
configured engine is of another type. Therefore, I'd like know if any thoughts 
has been given and what's plan to address this.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-04-16 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971936#comment-13971936
 ] 

Vaibhav Gumashta commented on HIVE-5775:


Hi [~jpullokkaran]; wanted to go through the code - can you please upload to 
review board? Thanks!

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-04-16 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971970#comment-13971970
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

I don't think this should go in to trunk yet.
I need to remove some of the limitations (outer join, union) before it can go 
on to trunk.

Also a better algorithm for join permutations is also being worked on.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-04-15 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970241#comment-13970241
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

First rev of CBO.

This is a limited version that does not support:
1. Outer Joins
2. Union
3. All of the UDFs
4. Doesn't play all permutations of joins



 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-04-15 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970417#comment-13970417
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

Thanks Julian Hyde, Harish Bhutani for help with CBO V1.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2013-11-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816237#comment-13816237
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

Attached is the first version of the CBO spec.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2013-11-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816299#comment-13816299
 ] 

Brock Noland commented on HIVE-5775:


Hi,

Thanks for the design document!  The document should also be uploaded to this 
location: https://cwiki.apache.org/confluence/display/Hive/DesignDocs

Brock

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2013-11-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816325#comment-13816325
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

sure will do.

Thanks
John




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf






--
This message was sent by Atlassian JIRA
(v6.1#6144)