[Hadoop Wiki] Update of "Hive/Roadmap" by AshishThusoo

Apache Wiki Tue, 07 Jul 2009 20:04:25 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by AshishThusoo:
http://wiki.apache.org/hadoop/Hive/Roadmap

The comment on the change is:
Incorpotating the new features that are being built

------------------------------------------------------------------------------
  Before adding to the list below, please check 
[https://issues.apache.org/jira/browse/HADOOP/component/12312455 JIRA] to see 
if a ticket has already been opened for the feature. If not, please open a 
ticket on the [http://issues.apache.org/jira/browse/HADOOP Hadoop JIRA] and 
select "contrib/hive" as the component and also update the following list.
  
  
- = Roadmap/call to add more features =
- The following is the list of useful features that are on the Hive Roadmap:
-   * HAVING clause support
+ = Features to be added =
+ == Features actively being worked on ==
+   * ODBC driver
+ 
+ == Short term Features == 
    * Support for various statistical functions like Median, Standard 
Deviation, Variance etc.
+   * Views and data variables in Hive so that data flows can be composed
+   * Integration with dumbo or map_reduce.py so that python code can be easily 
embedded in Hive
+ 
+ == More long term Features(yet to be prioritized) ==
    * Support for Create Table as Select
-   * Support for views
-   * Support for Insert Appends
    * Support for Inserts without listing the partitioning columns explicitly - 
the query should be able to derive that
    * Support for Indexes
-   * Support for IN
+   * UNIQUE JOINS - that support a different semantics than the outer joins
+   * Support for Insert Appends
+   * Using sort and bucketing properties to optimize queries
+   * Support for IN, exists and correlated subqueries
+   * More native types - Enums, timestamp
+   * Passing schema to scripts through an environment variable
+   * HAVING clause support
+   * Counters for streaming
+   * Error Reporting Improvements.  - Make error reporting for parse errors 
better
+ 
+ == Others ==
    * Support for Column Alias
    * Support for Statistics. - These stats are needed to make optimization 
decisions
-   * Join Optimizations. - Mapside joins, semi join techniques etc to do the 
join faster
+   * Join Optimizations. - Semi join, FRJ techniques etc to do the join faster
-   * Optimizations to reduce the number of map files created by filter 
operations.
    * Transformations in LOAD. - LOAD currently does not transform the input 
data if it is not in the format expected by the destination table.
-   * Schemaless map/reduce.  - TRANSFORM needs schema while map/reduce is 
schema less.
-   * Improvements to TRANSFORM.  - Make this more intuitive to map/reduce 
developers - evaluate some other keywords etc..
-   * Error Reporting Improvements.  - Make error reporting for parse errors 
better
    * Help on CLI.  - add help to the CLI
    * Explode and Collect Operators. - Explode and collect operators to convert 
collections to individual items and vice versa.
-   * Propagating sort properties to destination tables. - If the query 
produces sorted we want to capture that in the destination table's metadata so 
that downstream optimizations can be enabled. 
-   * Propagating bucketing properties to destination tables.
    * Multiple group-by inserts
      * Generate multiple group-by results by scanning the source table only 
once
      * Example:
        * FROM src
        * SELECT src.adid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP 
BY src.adid
        * SELECT src.pageid, COUNT(src.userid), COUNT(DISTINCT src.userid) 
GROUP BY src.pageid
-   * SerDe refactoring, and DynamicSerDe
-     * Refactor SerDe library to make sure we can serialize/deserialize and 
let UDF handle complex objects.
-     * We will be able to write a Hive Query to write data into a table that 
uses thrift serialization.
    * Let the user register UDF and UDAF
      * Expose register functions in UDFRegistry and UDAFRegistry
      * Provide commands in HiveCli to call those register functions
-   * ODBC/JDBC driver

[Hadoop Wiki] Update of "Hive/Roadmap" by AshishThusoo

Reply via email to