Github user ptgoetz commented on a diff in the pull request: https://github.com/apache/storm/pull/1816#discussion_r92243287 --- Diff: external/storm-pmml/README.md --- @@ -0,0 +1,104 @@ +#Storm PMML Bolt + Storm integration to load PMML models and compute predictive scores for running tuples. The PMML model represents + the machine learning (predictive) model used to do prediction on raw input data. The model is typically loaded into a + runtime environment, which will score the raw data that comes in the tuples. + +#Create Instance of PMML Bolt + To create an instance of the `PMMLPredictorBolt` you must provide a `ModelRunner` using a `ModelRunnerFactory`, + and optionally an instance of `ModelOutputFields`. The `ModelOutputFields` is only required if you wish to emit + tuples with predicted scores to one or multiple streams. Otherwise, the `PMMLPredictorBolt` will declare no + output fields. + + The `ModelRunner` represents the runtime environment to execute the predictive scoring. It has only one method: + + ```java + Map<Stream, List<Object>> scoredTuplePerStream(Tuple input); + ``` + + This method contains the logic to compute the scored tuples from the raw inputs tuple. It's up to the discretion of the + implementation to define which scored values are to be assigned to each `Stream`. A `Stream` is a representation of a Storm stream. + + The `PmmlModelRunner` is an extension of `ModelRunner` that represents the typical steps involved + in predictive scoring. Hence, it allows for the **extraction** of raw inputs from the tuple, **pre process** the + raw inputs, and **predict** the scores from the preprocessed data. + + The `JPmmlModelRunner` is an implementation of `PmmlModelRunner` that uses [JPMML](https://github.com/jpmml/jpmml) as + runtime environment. This implementation extracts the raw inputs from the tuple for all `active fields`, + and builds a tuple with the predicted scores for the `predicted fields` and `output fields`. + In this implementation all the declared streams will have the same scored tuple. + + The `predicted`, `active`, and `output` fields are extracted from the PMML model. + +#Run Bundled Examples + +To run the examples you must copy the `storm-pmml` uber jar to `STORM-HOME/extlib` and then run the command: --- End diff -- Can't the example just be a self contained shaded jar that is ready for deployment? It seems odd to require users to modify the pom and copy jars into extlib to run the example. Most of the other examples are deployable jars.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---