Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/1816#discussion_r92249343 --- Diff: external/storm-pmml/README.md --- @@ -0,0 +1,104 @@ +#Storm PMML Bolt + Storm integration to load PMML models and compute predictive scores for running tuples. The PMML model represents + the machine learning (predictive) model used to do prediction on raw input data. The model is typically loaded into a + runtime environment, which will score the raw data that comes in the tuples. + +#Create Instance of PMML Bolt + To create an instance of the `PMMLPredictorBolt` you must provide a `ModelRunner` using a `ModelRunnerFactory`, + and optionally an instance of `ModelOutputFields`. The `ModelOutputFields` is only required if you wish to emit + tuples with predicted scores to one or multiple streams. Otherwise, the `PMMLPredictorBolt` will declare no + output fields. + + The `ModelRunner` represents the runtime environment to execute the predictive scoring. It has only one method: + + ```java + Map<Stream, List<Object>> scoredTuplePerStream(Tuple input); + ``` + + This method contains the logic to compute the scored tuples from the raw inputs tuple. It's up to the discretion of the + implementation to define which scored values are to be assigned to each `Stream`. A `Stream` is a representation of a Storm stream. + + The `PmmlModelRunner` is an extension of `ModelRunner` that represents the typical steps involved + in predictive scoring. Hence, it allows for the **extraction** of raw inputs from the tuple, **pre process** the + raw inputs, and **predict** the scores from the preprocessed data. + + The `JPmmlModelRunner` is an implementation of `PmmlModelRunner` that uses [JPMML](https://github.com/jpmml/jpmml) as + runtime environment. This implementation extracts the raw inputs from the tuple for all `active fields`, + and builds a tuple with the predicted scores for the `predicted fields` and `output fields`. + In this implementation all the declared streams will have the same scored tuple. + + The `predicted`, `active`, and `output` fields are extracted from the PMML model. + +#Run Bundled Examples + +To run the examples you must copy the `storm-pmml` uber jar to `STORM-HOME/extlib` and then run the command: + + ```java + STORM-HOME/bin/storm jar STORM-HOME/external/storm-pmml/storm-pmml-examples-2.0.0-SNAPSHOT.jar + org.apache.storm.pmml.JpmmlRunnerTestTopology jpmmlTopology PMMLModel.xml RawInputData.csv + ``` +#Build Uber JAR + +To build the uber jar with all the dependencies for the module `storm-pmml` you must run the command + +``` +mvn package -f REPO_HOME/external/storm-pmml/pom.xml +``` + +after adding the following declaration to `REPO_HOME/storm/external/storm-pmml/pom.xml` + +``` +<build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <configuration> + <createDependencyReducedPom>true</createDependencyReducedPom> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + <configuration> + <transformers> + <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> + <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> + <mainClass>org.apache.storm.pmml.JpmmlRunnerTestTopology</mainClass> + </transformer> + </transformers> + </configuration> + </execution> + </executions> + </plugin> + </plugins> + </build> +``` + + +## License + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + + +## Committer Sponsors + * Sriharsha Chintalapani ([srihar...@apache.org](mailto:srihar...@apache.org)) + +This general abstraction has the purpose of supporting arbitrary implementations that compute predicted scores from raw inputs --- End diff -- Good catch. I forgot to clean this up. This info is already mentioned in the introduction section in slight different way, but with the same exact meaning. Will delete it.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---