Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by AshishThusoo: http://wiki.apache.org/hadoop/Hive/Roadmap The comment on the change is: Incorpotating the new features that are being built ------------------------------------------------------------------------------ Before adding to the list below, please check [https://issues.apache.org/jira/browse/HADOOP/component/12312455 JIRA] to see if a ticket has already been opened for the feature. If not, please open a ticket on the [http://issues.apache.org/jira/browse/HADOOP Hadoop JIRA] and select "contrib/hive" as the component and also update the following list. - = Roadmap/call to add more features = - The following is the list of useful features that are on the Hive Roadmap: - * HAVING clause support + = Features to be added = + == Features actively being worked on == + * ODBC driver + + == Short term Features == * Support for various statistical functions like Median, Standard Deviation, Variance etc. + * Views and data variables in Hive so that data flows can be composed + * Integration with dumbo or map_reduce.py so that python code can be easily embedded in Hive + + == More long term Features(yet to be prioritized) == * Support for Create Table as Select - * Support for views - * Support for Insert Appends * Support for Inserts without listing the partitioning columns explicitly - the query should be able to derive that * Support for Indexes - * Support for IN + * UNIQUE JOINS - that support a different semantics than the outer joins + * Support for Insert Appends + * Using sort and bucketing properties to optimize queries + * Support for IN, exists and correlated subqueries + * More native types - Enums, timestamp + * Passing schema to scripts through an environment variable + * HAVING clause support + * Counters for streaming + * Error Reporting Improvements. - Make error reporting for parse errors better + + == Others == * Support for Column Alias * Support for Statistics. - These stats are needed to make optimization decisions - * Join Optimizations. - Mapside joins, semi join techniques etc to do the join faster + * Join Optimizations. - Semi join, FRJ techniques etc to do the join faster - * Optimizations to reduce the number of map files created by filter operations. * Transformations in LOAD. - LOAD currently does not transform the input data if it is not in the format expected by the destination table. - * Schemaless map/reduce. - TRANSFORM needs schema while map/reduce is schema less. - * Improvements to TRANSFORM. - Make this more intuitive to map/reduce developers - evaluate some other keywords etc.. - * Error Reporting Improvements. - Make error reporting for parse errors better * Help on CLI. - add help to the CLI * Explode and Collect Operators. - Explode and collect operators to convert collections to individual items and vice versa. - * Propagating sort properties to destination tables. - If the query produces sorted we want to capture that in the destination table's metadata so that downstream optimizations can be enabled. - * Propagating bucketing properties to destination tables. * Multiple group-by inserts * Generate multiple group-by results by scanning the source table only once * Example: * FROM src * SELECT src.adid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.adid * SELECT src.pageid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.pageid - * SerDe refactoring, and DynamicSerDe - * Refactor SerDe library to make sure we can serialize/deserialize and let UDF handle complex objects. - * We will be able to write a Hive Query to write data into a table that uses thrift serialization. * Let the user register UDF and UDAF * Expose register functions in UDFRegistry and UDAFRegistry * Provide commands in HiveCli to call those register functions - * ODBC/JDBC driver
