Github user ptgoetz commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1816#discussion_r92243287
  
    --- Diff: external/storm-pmml/README.md ---
    @@ -0,0 +1,104 @@
    +#Storm PMML Bolt
    + Storm integration to load PMML models and compute predictive scores for 
running tuples. The PMML model represents
    + the machine learning (predictive) model used to do prediction on raw 
input data. The model is typically loaded into a 
    + runtime environment, which will score the raw data that comes in the 
tuples. 
    +
    +#Create Instance of PMML Bolt
    + To create an instance of the `PMMLPredictorBolt` you must provide a 
`ModelRunner` using a `ModelRunnerFactory`,
    + and optionally an instance of `ModelOutputFields`. The 
`ModelOutputFields` is only required if you wish to emit
    + tuples with predicted scores to one or multiple streams. Otherwise, the 
`PMMLPredictorBolt` will declare no
    + output fields.
    + 
    + The `ModelRunner` represents the runtime environment to execute the 
predictive scoring. It has only one method: 
    + 
    + ```java
    +    Map<Stream, List<Object>> scoredTuplePerStream(Tuple input); 
    + ```
    + 
    + This method contains the logic to compute the scored tuples from the raw 
inputs tuple.  It's up to the discretion of the 
    + implementation to define which scored values are to be assigned to each 
`Stream`. A `Stream` is a representation of a Storm stream.
    +   
    + The `PmmlModelRunner` is an extension of `ModelRunner` that represents 
the typical steps involved 
    + in predictive scoring. Hence, it allows for the **extraction** of raw 
inputs from the tuple, **pre process** the 
    + raw inputs, and **predict** the scores from the preprocessed data.
    + 
    + The `JPmmlModelRunner` is an implementation of `PmmlModelRunner` that 
uses [JPMML](https://github.com/jpmml/jpmml) as
    + runtime environment. This implementation extracts the raw inputs from the 
tuple for all `active fields`, 
    + and builds a tuple with the predicted scores for the `predicted fields` 
and `output fields`. 
    + In this implementation all the declared streams will have the same scored 
tuple.
    + 
    + The `predicted`, `active`, and `output` fields are extracted from the 
PMML model.
    +
    +#Run Bundled Examples
    +
    +To run the examples you must copy the `storm-pmml` uber jar to 
`STORM-HOME/extlib` and then run the command:
    --- End diff --
    
    Can't the example just be a self contained shaded jar that is ready for 
deployment? It seems odd to require users to modify the pom and copy jars into 
extlib to run the example. Most of the other examples are deployable jars.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to