Harish Butani created HIVE-7940:
-----------------------------------
Summary: Expose Machine Learning functions and Model application
in Hive
Key: HIVE-7940
URL: https://issues.apache.org/jira/browse/HIVE-7940
Project: Hive
Issue Type: New Feature
Reporter: Harish Butani
*Machine Learning functions*
# [HiveMall|https://github.com/myui/hivemall] has demonstrated how to do
machine learning in Hive. It has an extensive set of functions; it shows a way
through UDTFs and Amplify technique to do iterative computations. There is a
lot of interest in the Hive User community to use HiveMall.
# Other possible ways to expose machine learning functionality:
#* via Script Operator(Or Table Functions) that call out to a Machine Learning
service like [Oxdata|https://github.com/0xdata/h2o]. In this scheme the
service's nodes would communicate outside of hive, process the data in multiple
iterations and then return the result back into the hive pipeline.
#* At the language level, provide an iteration mechanism in Hive: this has more
general applications: to express Recursive CTEs and also to express Graph
Algorithms.
*Model Application*
Even when Regression/Classification models are build in other tools we should
provide a way to evaluate these models against the entire dataset residing in
Hive. These can be exposed as UDFs in Hive. A possible route could be a generic
PMML based module, for e.g. [JPMML-Hive|https://github.com/jpmml/jpmml-hive].
Or we should provide integration for specific libraries: Spark MLLib, R and
Python (SciPy/NumPy) seem the most popular toolkits.
The *goal* would be to provide Machine Learning functionality as a Feature of
Hive like [MadLib|http://madlib.net/] on Postgres, Pivotal, Impala etc.
Capturing this high level requirement in this jira.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)