Hive Roadmap (Some information)

Ashish Thusoo Mon, 27 Oct 2008 11:31:58 -0700

Folks,

Here are some of the things that we are working on internally at Facebook. We 
thought it would be a good idea to let everyone know what is going on with Hive 
development. We will put this up on the wiki as well.


1. Integrating Dynamic SerDe with the DDL. (Zheng/Pete) - This allows the users 
to create typed tables along with list and map types from the DDL
2. Support for Statistics. (Ashish) - These stats are needed to make 
optimization decisions
3. Join Optimizations. (Prasad) - Mapside joins, semi join techniques etc to do 
the join faster
4. Predicate Pushdown Optimizations. (Namit) - pushing predicates just above 
the table scan for certain situations in joins as well as ensuring that only 
required columns are sent across map/reduce boundaries
5. Group By Optimizations. (Joydeep) - various optimizations to make group by 
faster
6. Optimizations to reduce the number of map files created by filter 
operations. (Dhrubha) - Filters with a large number of mappers produces a lot 
of files which slows down the following operations. This tries to address 
problems with that.
7. Transformations in LOAD. (Joydeep) - LOAD currently does not transform the 
input data if it is not in the format expected by the destination table.
8. Schemaless map/reduce. (Zheng) - TRANSFORM needs schema while map/reduce is 
schema less.
9. Improvements to TRANSFORM. (Zheng) - Make this more intuitive to map/reduce 
developers - evaluate some other keywords etc..
10. Error Reporting Improvements. (Pete) - Make error reporting for parse 
errors better
11. Help on CLI. (Joydeep) - add help to the CLI
12. Explode and Collect Operators. (Zheng) - Explode and collect operators to 
convert collections to individual items and vice versa.
13. Propagating sort properties to destination tables. (Prasad) - If the query 
produces sorted we want to capture that in the destination table's metadata so 
that downstream optimizations can be enabled.

Other contributions from outside FB ...
1. JDBC driver (Michi Mutsuzaki @ stanford.edu, Raghu @ stanford.edu)
2. Fixes to CLI driver (Jeremy Huylebroeck)
3. Web interface...

Most of these have a JIRA associated. A lot of focus is on running things 
faster in Hive considering that we have a good feature set now...

Comments/contributions are welcome. Please go to the JIRA and check out 
contrib/hive...

Thanks,
Ashish

Hive Roadmap (Some information)

Reply via email to