[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-09-15 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@ijokarumawak - Thanks for your comments.  I will checkout the pointers you 
have given.  Mans


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-09-13 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2686
  
Hi @mans2singh sorry for the slow progress. I've been swamped with other 
urgent tasks. Will come back to continue reviewing this once those get done.

BTW, do you have any suggestion to convert string values into nominal 
numeric values, things like categories such as log level (ERR, INFO, WARN ... 
etc).
I found this web page useful to think about data types for machine learning 
and deep learning.

https://towardsdatascience.com/7-data-types-a-better-way-to-think-about-data-types-for-machine-learning-939fae99a689

I wonder how we can implement such conversions at NiFi. I still think NiFi 
needs the capability to convert general data into numeric vector so that DL4J 
can be applied. Without that these processors are hard to use.

I briefly took a look at datavec project, but it seems it depends on Spark 
to execute such transformation. I envision NiFi doing such task within its data 
flow so that it can use DeepLearning model directly against incoming streaming 
data.
https://deeplearning4j.org/datavec

Anyone can join this review, please do so.


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-09-10 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@markap14 @ijokarumawak @jzonthemtn and Nifi Team: Please let me know if 
you have any additional comments for this processor.  Thanks.


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-07-12 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
Hi @ijokarumawak - 

I've merged your changes.  Please let me know if you have any more 
recommendations.

Thanks for your help.

Mans 


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-07-03 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2686
  
Sent mans2singh/nifi#1


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-20 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@ijokarumawak - Just following-up:

1.  Regarding implementing record look up service - I believe that the 
processor and a record service lookup can be separate components useful for 
different use cases. As the file/rdbms based flows I've mentioned above, show - 
the processor can be used as a transformer.
2. Regarding providing more tools to prepare data - You are right, we can 
do that once we have the basics in place and there is a need for it. 
3. You had mentioned the concern of writing the results (predictions) in 
the body of the flow file - if you/community think we should keep the 
observations in the body and put the output in an attribute, I'd be happy to 
change that.

Thanks again for your thoughts and let me know if you have any more 
advice/recommendations.

Mans


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-17 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2686
  
Hi @mans2singh 

Thanks for sharing the templates. I will try it later but I understand how 
it works by using RDMBS to join input data and classification results.

I think we need to provide more toolset so that people can use DL4J easily 
from NiFi flows. So a couple of points for further discussion:

1. Have you ever considered a DL4J component implementing RecordLookup 
interface so that LookupRecord processor? I've mentioned about this before. 
Once we provide this pattern, RDBMS is no longer required to use evaluation 
result in later flow.

2. How would NiFi user convert a raw data (CSV, text, images, audio ... 
etc) into vector format that can be fed to DL4J Networks? I haven't explored 
yet myself, but can we provide some other Processors or RecordReader/Writers to 
do vectorization process, probably using DataVec? 
https://deeplearning4j.org/datavec


I will ask dev ML to see if there's anyone to join the review process, too. 
Thanks!


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-17 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@ijokarumawak - 

I've created two flow templates for testing the DL4J processor.  

The first with a file input and file output. The second flow reads a row 
from rdbms based on id input file containing the single id, classifies the 
observation and save the classification results in same row.  

The supporting files along with a sample classification model that will 
work with the two templates are in the repository 
[nifi-flow-examples](https://github.com/mans2singh/nifi-flow-examples.git) 
branch nifi-dl4j-flow.  

Here are some details of the two flow templates:

1. The first template is simple one 
[NifiDL4JFileInputOutput.xml](https://github.com/mans2singh/nifi-flow-examples/blob/nifi-dl4j-flow/dl4jtemplate/NifiDL4JFileInputOutput.xml)
 that ingests file containing an observation record from directory (sample 
input in dl4jinput), applies the classification model and writes results to the 
output directory with the same file name as input.  In this scenario, the 
correlation is based on the file names.

2. The second 
[NifiDL4JFileToRdbms.xml](https://github.com/mans2singh/nifi-flow-examples/blob/nifi-dl4j-flow/dl4jtemplate/NifiDL4JFileToRdbms.xml)
 reads a single id from input file (sample in dl4jinputid directory), queries a 
rdbms table for the observations for the input id, classifies the observation 
and updates the db row with classification results.  In this flow template, the 
row id of the input is used as a correlation id which is used to update the 
output column of the corresponding row after the classification is done. The 
flow uses other Nifi processors to ingest, transform, save the classification 
results.  The table creation and observation row insertion commands are in 
dl4jsql directory. 

The flow templates require setting the appropriate input/output files, dl4j 
model, db controller and rdbms table with the records.

Please let me know your thoughts/feedback.

Thanks

Mans


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-12 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@mans2singh As long as we can construct useful flow with these components, 
I am ok on how to do that.

By using correlation id, we can merge original FlowFile to the prediction 
result. However I assume we need to write a custom code with ScriptedProcessor 
to do something from there, if we want to do something within a NiFi flow.
Would you be able to provide a flow template featuring the processor with 
input dataset to illustrate how correlation ids helps merging prediction result 
to the original dataset?


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-12 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@joewitt - I will remove the nar from the assembly.  Let me know if there 
is any additional feedback.  Thanks.


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-12 Thread joewitt
Github user joewitt commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@mans2singh  yes please don't include the new built nar in the assembly.  
We need an extension registry so for now just dont include it.  People can 
still pull it in and use it if they wish


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-12 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@ijokarumawak - I was thinking that the processor will be used as a 
transformer (predictor) and there would be a correlation attribute in the flow 
file which would be used to associate the results with the observations.  This 
will keep the focus on transformation with simple outputs while still allowing 
the user the flexibility to use the correlation id to combine/enrich it with 
other data using a enrichment processor.   I've added test cases which show how 
to use correlation id.  What's your thought ?


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-10 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@ijokarumawak - 
Thanks for your feedback.  I was away for a few days and will respond to 
your comments soon.
Mans


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-06-07 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@mans2singh I have been thinking about how to use this processor in a 
practical NiFi data flow. Certainly the processor can use a deep learning model 
to classify or predict using regression, but current approach that writing 
evaluation result into FlowFile content may not be useful in real data flows.

Let's say user want to route incoming data into different branches of a 
data flow to process differently. The most basic use-case would be binary 
classification. If a given data is predicted as class A, then do something, 
such as sending an alert. In order to do so, we need to carry original data to 
report meaningful alert. By rewriting FlowFile content only using result makes 
it difficult to tie-up original data and prediction result, and it is hard to 
construct subsequent flow.

By considering real use-cases more, I started feeling this is more of a 
Enrich or Lookup pattern.

```
# Original dataset
Record1
Record2
Record3

# Convert the original dataset into a vector to applying a model, while 
keeping original data to preserve relationships.
Record1, Feature Vector1
Record2, Feature Vector2
Record3, Feature Vector3

# Then we can further enrich records with prediction results
Record1, Result1 (A:0.9, B:0.1)
Record2, Result2 (A:0.85, B:0.15)
Record3, Result3 (A:0.05, B:0.95)

# Once we have such FlowFile, we can filter certain dataset based on 
prediction
# Route class A into flow branch A
Record1
Record2
# Route class B into branch B
Record 3

# Then we can produce some meaningful report using original information
Send an alert based on Record3
```

Have you ever looked at LookupRecord processor and RecordLookupService 
controller service? I think we can do more interesting things if we implement 
as a RecordLookupService.

How do you think?





---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-05-21 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@markap14, NIFI team

Just wondering if you have any feedback on this processor.  

Thanks


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-05-11 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@markap14 - 

Thanks for your prompt review and advice.  

I've updated the code based on your review and am looking forward to 
your/other members feedback.

Thanks again.

Mans


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-05-10 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2686
  
@mans2singh the issue that you noted in Travis is unrelated to your PR and 
is a problem with an existing using test, unfortunately. So nothing to do 
there, really. Hopefully it will be addressed on mater soon.


---


[GitHub] nifi issue #2686: NIFI-5166 - Deep learning classification and regression pr...

2018-05-08 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/2686
  
Good Morning Nifi Folks:

The appveyor build is passing for this PR but travis-ci build is failing 
with the following message:

`[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on 
project nifi-cdc-mysql-processors: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/travis/build/apache/nifi/nifi-nar-bundles/nifi-cdc/nifi-cdc-mysql-bundle/nifi-cdc-mysql-processors/target/surefire-reports
 for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, 
[date].dumpstream and [date]-jvmRun[N].dumpstream.`

Can you please advice on how to resolve this error ?

Thanks

Mans


---