[
https://issues.apache.org/jira/browse/HIVE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283856#comment-13283856
]
Edward Capriolo commented on HIVE-3027:
---------------------------------------
Patches welcome. I am sure if you re factor the code and make it better no one
will be adverse .
> The optimizer architecture of Hive is terrible, need code refactoring
> ---------------------------------------------------------------------
>
> Key: HIVE-3027
> URL: https://issues.apache.org/jira/browse/HIVE-3027
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 0.8.1
> Reporter: anders
> Labels: architecture, optimizer, ysmart
>
> Now I want to add a complete cost-based optimization for hive. but when I
> begin the work, I found it very difficult to do using current hive
> optimization framework. The current code of hive, optimizations are all done
> after generating DAG of operators. It is a awful design and makes me mad. For
> example, the map-side optimization, it scans the whole operators' DAG and try
> to find the operators that can be replaced by map-operation and then replace
> it. How terrible and stupid the code is!!! The terrible code expands to 1000
> lines, and only implements the map-side optimizations!!!
> In my opinion, optimization shouldn't be done in a separated step, differnt
> optimization should be done in appropriate time. For example, join reorder,
> should be done when we parse the input query, and we can generate Map-Reduce
> operators or only Map-Operator for each join according to the cost
> estimation. And, in the process, we can do join and aggreagation merge, and,
> we shoud push down predicate in proper time and generate proper data
> sturcture, to insure the cose-estimation module can fetch corresponding
> predicate of each base table for estimating JOIN cost. How concise and
> graceful the code will be if we do the optimization this way!!! But Now, in
> order to complying with the Optimiser framework of Hive, I have to write lots
> of ugly code with amazing redundancy, and, the code is very very difficult to
> debug!!!! Now there is a patch of cost-based JOIN reorder and merge optimizer
> called YSMART, I glance at it. It use 6000+ code and is difficult to read!!
> And it's optimization is incompleted.
> The optimizer architecture of Hive is terrible, How can I do now?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira