[jira] [Closed] (SPARK-9084) Add in support for realtime data predictions using ML PipelineModel

Joseph K. Bradley (JIRA) Thu, 16 Jul 2015 16:36:45 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joseph K. Bradley closed SPARK-9084.
------------------------------------
    Resolution: Later

> Add in support for realtime data predictions using ML PipelineModel
> -------------------------------------------------------------------
>
>                 Key: SPARK-9084
>                 URL: https://issues.apache.org/jira/browse/SPARK-9084
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Hollin Wilkins
>            Priority: Minor
>
> Currently ML provides excellent support for feature manipulation, model 
> selection, and prediction for large datasets. The models can all be easily 
> serialized but currently it is not possible to use the fitted models without 
> a DataFrame. This means that these models are only good for batch processing. 
> In order to support realtime ML pipelines, I propose adding in three new 
> methods to the Transformer class:
> def transform(row: StructuredRow): StructuredRow
> def transform(row: StructuredRow, paramMap: ParamMap): StructuredRow
> def transform(row: StructuredRow, firstParamPair: ParamPair[_], 
> otherParamPairs: ParamPair[_]*): StructuredRow
> Where a StructuredRow is a case class that is the combination of an 
> org.apache.spark.sql.Row and an org.apache.spark.sql.types.StructType. An 
> alternative would be to modify the transform method signature to take in two 
> objects, a StructType and a Row.
> This change necessitates the addition of the new transform method to each 
> implementor of the Transformer class.
> Following this change, it would be trivial to include the spark jars in an 
> API server, deserialize an ML PipelineModel object, take incoming data from 
> users, convert it into a StructuredRow and feed it into the PipelineModel to 
> get a realtime result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-9084) Add in support for realtime data predictions using ML PipelineModel

Reply via email to