[GitHub] storm pull request #1816: STORM-2223: PMMLBolt

hmcl Tue, 13 Dec 2016 11:41:04 -0800

Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1816#discussion_r92249343
  
    --- Diff: external/storm-pmml/README.md ---
    @@ -0,0 +1,104 @@
    +#Storm PMML Bolt
    + Storm integration to load PMML models and compute predictive scores for 
running tuples. The PMML model represents
    + the machine learning (predictive) model used to do prediction on raw 
input data. The model is typically loaded into a 
    + runtime environment, which will score the raw data that comes in the 
tuples. 
    +
    +#Create Instance of PMML Bolt
    + To create an instance of the `PMMLPredictorBolt` you must provide a 
`ModelRunner` using a `ModelRunnerFactory`,
    + and optionally an instance of `ModelOutputFields`. The 
`ModelOutputFields` is only required if you wish to emit
    + tuples with predicted scores to one or multiple streams. Otherwise, the 
`PMMLPredictorBolt` will declare no
    + output fields.
    + 
    + The `ModelRunner` represents the runtime environment to execute the 
predictive scoring. It has only one method: 
    + 
    + ```java
    +    Map<Stream, List<Object>> scoredTuplePerStream(Tuple input); 
    + ```
    + 
    + This method contains the logic to compute the scored tuples from the raw 
inputs tuple.  It's up to the discretion of the 
    + implementation to define which scored values are to be assigned to each 
`Stream`. A `Stream` is a representation of a Storm stream.
    +   
    + The `PmmlModelRunner` is an extension of `ModelRunner` that represents 
the typical steps involved 
    + in predictive scoring. Hence, it allows for the **extraction** of raw 
inputs from the tuple, **pre process** the 
    + raw inputs, and **predict** the scores from the preprocessed data.
    + 
    + The `JPmmlModelRunner` is an implementation of `PmmlModelRunner` that 
uses [JPMML](https://github.com/jpmml/jpmml) as
    + runtime environment. This implementation extracts the raw inputs from the 
tuple for all `active fields`, 
    + and builds a tuple with the predicted scores for the `predicted fields` 
and `output fields`. 
    + In this implementation all the declared streams will have the same scored 
tuple.
    + 
    + The `predicted`, `active`, and `output` fields are extracted from the 
PMML model.
    +
    +#Run Bundled Examples
    +
    +To run the examples you must copy the `storm-pmml` uber jar to 
`STORM-HOME/extlib` and then run the command:
    + 
    + ```java
    + STORM-HOME/bin/storm jar 
STORM-HOME/external/storm-pmml/storm-pmml-examples-2.0.0-SNAPSHOT.jar 
    + org.apache.storm.pmml.JpmmlRunnerTestTopology jpmmlTopology PMMLModel.xml 
RawInputData.csv
    + ```
    +#Build Uber JAR 
    +
    +To build the uber jar with all the dependencies for the module 
`storm-pmml` you must run the command
    +
    +```
    +mvn package -f REPO_HOME/external/storm-pmml/pom.xml
    +```
    +
    +after adding the following declaration to 
`REPO_HOME/storm/external/storm-pmml/pom.xml`
    +
    +```
    +<build>
    +        <plugins>
    +            <plugin>
    +                <groupId>org.apache.maven.plugins</groupId>
    +                <artifactId>maven-shade-plugin</artifactId>
    +                <configuration>
    +                    
<createDependencyReducedPom>true</createDependencyReducedPom>
    +                </configuration>
    +                <executions>
    +                    <execution>
    +                        <phase>package</phase>
    +                        <goals>
    +                            <goal>shade</goal>
    +                        </goals>
    +                        <configuration>
    +                            <transformers>
    +                                <transformer 
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
    +                                <transformer 
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
    +                                    
<mainClass>org.apache.storm.pmml.JpmmlRunnerTestTopology</mainClass>
    +                                </transformer>
    +                            </transformers>
    +                        </configuration>
    +                    </execution>
    +                </executions>
    +            </plugin>
    +        </plugins>
    +    </build>
    +```
    +
    +
    +## License
    +
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +
    +
    +## Committer Sponsors
    + * Sriharsha Chintalapani 
([[email protected]](mailto:[email protected]))
    +
    +This general abstraction has the purpose of supporting arbitrary 
implementations that compute predicted scores from raw inputs
    --- End diff --
    
    Good catch. I forgot to clean this up. This info is already mentioned in 
the introduction section in slight different way, but with the same exact 
meaning. Will delete it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1816: STORM-2223: PMMLBolt

Reply via email to