Harish Butani created HIVE-7940:
-----------------------------------

             Summary: Expose Machine Learning functions and Model application 
in Hive
                 Key: HIVE-7940
                 URL: https://issues.apache.org/jira/browse/HIVE-7940
             Project: Hive
          Issue Type: New Feature
            Reporter: Harish Butani


*Machine Learning functions*
# [HiveMall|https://github.com/myui/hivemall] has demonstrated how to do 
machine learning in Hive. It has an extensive set of  functions; it shows a way 
through UDTFs and Amplify technique to do iterative computations. There is a 
lot of interest in the Hive User community to use HiveMall.
# Other possible ways to expose machine learning functionality:
#* via Script Operator(Or Table Functions) that call out to a Machine Learning 
service like [Oxdata|https://github.com/0xdata/h2o]. In this scheme the 
service's nodes would communicate outside of hive, process the data in multiple 
iterations and then return the result back into the hive pipeline.
#* At the language level, provide an iteration mechanism in Hive: this has more 
general applications: to express Recursive CTEs and also to express Graph 
Algorithms.

*Model Application*
Even when  Regression/Classification models are build in other tools we should 
provide a way to evaluate these models against the entire dataset residing in 
Hive. These can be exposed as UDFs in Hive. A possible route could be a generic 
PMML based module, for e.g. [JPMML-Hive|https://github.com/jpmml/jpmml-hive]. 
Or we should provide integration for specific libraries: Spark MLLib, R and 
Python (SciPy/NumPy) seem the most popular toolkits.


The *goal* would be to provide Machine Learning functionality as a Feature of 
Hive like [MadLib|http://madlib.net/] on Postgres, Pivotal, Impala etc.
Capturing this high level requirement in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to