[ https://issues.apache.org/jira/browse/MINIFICPP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099139#comment-17099139 ]
Marc Parisi commented on MINIFICPP-1201: ---------------------------------------- +1 Yeah totally. thanks for following up. I think in the e-mail chain it was becoming apparent that I wasn't correctly relaying that I think H2O-3 needed to be introduced as a corrective action. > Integrates MiNiFi C++ with H2O Driverless AI MOJO Scoring Pipeline (C++ > Runtime Python Wrapper) To Do ML Inference on Edge > -------------------------------------------------------------------------------------------------------------------------- > > Key: MINIFICPP-1201 > URL: https://issues.apache.org/jira/browse/MINIFICPP-1201 > Project: Apache NiFi MiNiFi C++ > Issue Type: New Feature > Affects Versions: master > Environment: Ubuntu 18.04 in AWS EC2 > MiNiFi C++ 0.7.0 > Reporter: James Medel > Priority: Blocker > Fix For: 0.8.0 > > > *MiNiFi C++ and H2O Driverless AI Integration* via Custom Python Processors: > Integrates MiNiFi C++ with H2O Driverless AI by using Driverless AI's MOJO > Scoring Pipeline (in C++ Runtime Python Wrapper) and MiNiFi's Custom Python > Processor. Uses a Python Processor to execute the MOJO Scoring Pipeline to do > batch scoring or real-time scoring for one or more predicted labels on > tabular test data in the incoming flow file content. If the tabular data is > one row, then the MOJO does real-time scoring. If the tabular data is > multiple rows, then the MOJO does batch scoring. I would like to contribute > my processors to MiNiFi C++ as a new feature. > *1 custom python processor* created for MiNiFi: > *H2oMojoPwScoring* - Executes H2O Driverless AI's MOJO Scoring Pipeline in > C++ Runtime Python Wrapper to do batch scoring or real-time scoring on a > frame of data within each incoming flow file. Requires the user to add the > *pipeline.mojo* filepath into the "MOJO Pipeline Filepath" property. This > property is used in the onTrigger(context, session) function to get the > pipeline.mojo filepath, so we can *pass it into* the > *daimojo.model(pipeline_mojo_filepath)* function to instantiate our > *mojo_scorer*. MOJO creation time and uuid are added as individual flow file > attributes. Then the *flow file content* is *loaded into Datatable* *frame* > to hold the test data. Then a Python lambda function called compare is used > to compare whether the datatable frame header column names equals the > expected header column names from the mojo scorer. This check is done because > the datatable frame could have a missing header, which is true when the > header does not equal the expected header and so we update the datatable > frame header with the mojo scorer's expected header. Having the correct > header works nicely because the *mojo scorer's* *predict(datatable_frame)* > function needs the header and then does the prediction returning a > predictions datatable frame. The mojo scorer's predict function is *capable > of doing real-time scoring or batch scoring*, it just depends on the amount > of rows that the tabular data has. This predictions datatable frame is then > converted to pandas dataframe, so we can use pandas' to_string(index=False) > function to convert the dataframe to a string without the dataframe's index. > Then *the prediction string is written to flow file content*. A flow file > attribute is added for the number of rows scored. Another one or more flow > file attributes are added for the predicted label name and its associated > score. Finally, the flow file is transferred on a success relationship. > > *Hydraulic System Condition Monitoring* Data used in MiNiFi Flow: > The sensor test data I used in this integration comes from Kaggle: Condition > Monitoring of Hydraulic Systems. I was able to predict hydraulic system > cooling efficiency through MiNiFi and H2O integration described above. This > use case here is hydraulic system predictive maintenance. -- This message was sent by Atlassian Jira (v8.3.4#803005)