[jira] [Updated] (NIFI-4569) Brief summary of a newbie journey into NiFi

Eric Chaves (JIRA) Sun, 05 Nov 2017 14:57:34 -0800

     [ 
https://issues.apache.org/jira/browse/NIFI-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Chaves updated NIFI-4569:
------------------------------
    Description: 
Hi folks,

As requested at Nifi's User mailing list, I'm compiling a small briefing of my 
experience as a new user working with NiFi. This report includes some feedback 
while I was learning NiFi following some tutorial and articles found on the 
internet.

In particular, after scouting the internet and reading NiFi's docs, I first 
tried to write some custom flows using scripted processors and scripted 
services.

Most of the troubles I've faced I attribute to my lack of expertise with 
Java/Groovy plus my small knowledge with NiFi's architecture. I did faced some 
inconsistencies however that may be eased for newcomers like me in future 
releases.

*Development workflow*
To write my flows I'm using the official docker image (apache/nifi:1.4.0) 
together with other images like mysql, mongodb and localstack (to simulate AWS 
services).

It took me a while to properly setup a working folder using docker. When I 
simple run the container without any volume binded to my local host, NiFi 
started properly because all the required files are already present in conf 
dir. This setup works fine for trying NiFi out but is not good as development 
starting point because changes may be lost due to container rebuilt.

To address this I first attempted to bind volumes for ./conf and ./logs folder 
but this didn't worked as expected because this produced and empty ./conf 
folder that prevents nifi from starting. To get this right was a little hard 
and I only did it after looking at some github's projects to compare they're 
./conf dir with NiFi's default configuration inside the docker image. 

_- *Suggestion #1*: NiFi shoud created default configuration files when no file 
is present in ./conf_

After getting my docker right I started creating some flows following some 
Hortonworks tutorials and others website (like Matt's blog) and the first thing 
that confused me was the lack of a simple to get a new blank canvas. In order 
to do so I need to stop all components, delete processors and groups, ports and 
etc.

_- *Suggestion #2*: Add an "new canvas" button to the operations pallete (or 
somewhere else)._

At this point I started writing some flows that I would like to keep under 
version control and there is no simple way for that. NiFi's canvas is kept in a 
gzip file which is not ideal for version control and the current way of 
exporting a canvas requires us to save it as template and then export the 
template. This process seems ok when we have finished a flow and want to export 
it for future use as an actual template but is not good for small commits while 
working in the flow (WIP commits). A better approach would be to have the 
flowfile as xml directly under source control.

_- *Suggestion #3*: Enable a configuration mode to kept the current canva as 
pure xml instead of  GZipped file._

*Writing Custom Processors*
Not much to be said here. The experience was awesome. The developers guide is 
very good and Matt's 3 post 'nifi scripting cookbooks' are priceless. Those 
cookbooks should be added to the default documentation.

Two things hit me here: I assumed that all JAR used by NiFi's default 
processors were available for use by a script processor (requiring it only to 
be imported) but that is not the case and I needed to re-add some JAR's (like 
javax.mail or AWS java sdk) to the script's modules folder. That was not 
intuitive for me but maybe because I'm not a Java developer. I also had a 
trouble with java loading proper handlers for mime types (something related to 
javax.mail) that blocked me to use Groovy for a custom script write multipart 
mime messages (I ended writing it in python).

- *Suggestion #4*: Make clear which JAR files are available by default inside a 
 and which are not and how to properly configure the system class loader.

*Writing Custom Services*
I'm working on a custom flow that needs to enrich some data records. Since I 
had successfully wrote some processors scripts I tried to script a custom 
LookupService and after googling around I used two sources as reference: 1) An 
Andy Lopresto script found at gist and 2) the test_lookup_inline.groovy at 
NiFi's source code.

I made some mistakes and my script lookup was not working so I decided to log 
some info in order to troubleshoot my code but no information was logged. 
That's when I noticed that my docker image was not producing any logs. I read 
the administrator guide to see If I was missing something at bootstrap.conf 
file but my file was ok according to the docs.

It was only when I scouted the logback.xml file that I noticed an variable 
"bootstrap.conf.dir" somewhere and that hinted me to try adding a "log.dir=" 
key to the bootstrap.conf file. At this time I also found that I had the wrong 
permissions on my local log folder which was probably the reason why no logs 
were being written. Since I made both changes at once I can't say for sure 
which one fixed the logs.

Only once I got the nifi-user.log properly working that I could see the 
exception was being raised (because there wasn't a log object) and that was the 
whole error. Once fixed the scripted lookup worked like a charm.

Matt explained in the user's list the reasons why there is no log by default on 
controller services and it makes sense however as a user I got a little lost 
because when configuring processors I can set the log level very easily but 
when comes to ControllerServices the dialog has no mention about how or where 
log is done. 

The same thing happens when I was editing my script code and re-running. With 
processors I point the script file and in order to have it reloaded in case of 
changes I only need to stop/start the processor. 

For scripted services I assumed that disabling/enabling them would had the same 
effect but that was not the case. Once enabled I could only had it reloaded by 
stoping my docker container and starting it again. It took me a while before I 
could figured out that my new code was not being loaded and instead the 
previous versions still in use (even after disabling/enabling it).

- *Suggestion #5*: Improve UX consistency between processor configuration and 
controller services configuration. Allow service's code reload with 
enable/disable and add some link for how to properly log service messages. 
Expose a log object on scripted components.

- *Suggestion #6*: Add a section into developer guide explaining the default 
interfaces (like LookupServices) and how they should be implemented/extended

Anyway I'm really enjoying NiFi and day after day it's becoming easier to 
understand it's component's model.

Great work guys and thanks for such a great software!!

  was:
Hi folks,

As requested at Nifi's User mailing list, I'm compiling a small briefing of my 
experience as a new user working with NiFi. This report includes some feedback 
while I was learning NiFi following some tutorial and articles found on the 
internet.

In particular, after scouting the internet and reading NiFi's docs, I first 
tried to write some custom flows using scripted processors and scripted 
services.

Most of the troubles I've faced I attribute to my lack of expertise with 
Java/Groovy plus my small knowledge with NiFi's architecture. I did faced some 
inconsistencies however that may be eased for newcomers like me in future 
releases.

*Development workflow*
To write my flows I'm using the official docker image (apache/nifi:1.4.0) 
together with other images like mysql, mongodb and localstack (to simulate AWS 
services).

It took me a while to properly setup a working folder using docker. When I 
simple run the container without any volume binded to my local host, NiFi 
started properly because all the required files are already present in conf 
dir. This setup works fine for trying NiFi out but is not good as development 
starting point because changes may be lost due to container rebuilt.

To address this I first attempted to bind volumes for ./conf and ./logs folder 
but this didn't worked as expected because this produced and empty ./conf 
folder that prevents nifi from starting. To get this right was a little hard 
and I only did it after looking at some github's projects to compare they're 
./conf dir with NiFi's default configuration inside the docker image. 

_- *Suggestion #1*: NiFi shoud created default configuration files when no file 
is present in ./conf_

After getting my docker right I started creating some flows following some 
Hortonworks tutorials and others website (like Matt's blog) and the first thing 
that confused me was the lack of a simple to get a new blank canvas. In order 
to do so I need to stop all components, delete processors and groups, ports and 
etc.

_- *Suggestion #2*: Add an "new canvas" button to the operations pallete (or 
somewhere else)._

At this point I started writing some flows that I would like to keep under 
version control and there is no simple way for that. NiFi's canvas is kept in a 
gzip file which is not ideal for version control and the current way of 
exporting a canvas requires us to save it as template and then export the 
template. This process seems ok when we have finished a flow and want to export 
it for future use as an actual template but is not good for small commits while 
working in the flow (WIP commits). A better approach would be to have the 
flowfile as xml directly under source control.

_- *Suggestion #3*: Enable a configuration mode to kept the current canva as 
pure xml instead of  GZipped file._

*Writing Custom Processors*
Not much to be said here. The experience was awesome. The developers guide is 
very good and Matt's 3 post 'nifi scripting cookbooks' are priceless. Those 
cookbooks should be added to the default documentation.

Two things hit me here: I assumed that all JAR used by NiFi's default 
processors were available for use by a script processor (requiring it only to 
be imported) but that is not the case and I needed to re-add some JAR's (like 
javax.mail or AWS java sdk) to the script's modules folder. That was not 
intuitive for me but maybe because I'm not a Java developer. I also had a 
trouble with java loading proper handlers for mime types (something related to 
javax.mail) that blocked me to use Groovy for a custom script write multipart 
mime messages (I ended writing it in python).

- *Suggestion #4*: Make clear which JAR files are available by default inside a 
 and which are not and how to properly configure the system class loader.

*Writing Custom Services*
I'm working on a custom flow that needs to enrich some data records. Since I 
had successfully wrote some processors scripts I tried to script a custom 
LookupService and after googling around I used two sources as reference: 1) An 
Andy Lopresto script found at gist and 2) the test_lookup_inline.groovy at 
NiFi's source code.

I made some mistakes and my script lookup was not working so I decided to log 
some info in order to troubleshoot my code but no information was logged. 
That's when I noticed that my docker image was not producing any logs. I read 
the administrator guide to see If I was missing something at bootstrap.conf 
file but my file was ok according to the docs.

It was only when I scouted the logback.xml file that I noticed an variable 
"bootstrap.conf.dir" somewhere and that hinted me to try adding a "log.dir=" 
key to the bootstrap.conf file. At this time I also found that I had the wrong 
permissions on my local log folder which was probably the reason why no logs 
were being written. Since I made both changes at once I can't say for sure 
which one fixed the logs.

Only once I got the nifi-user.log properly working that I could see the 
exception was being raised (because there wasn't a log object) and that was the 
whole error. Once fixed the scripted lookup worked like a charm.

Matt explained in the user's list the reasons why there is no log by default on 
controller services and it makes sense however as a user I got a little lost 
because when configuring processors I can set the log level very easily but 
when comes to ControllerServices the dialog has no mention about how or where 
log is done. 

The same thing happens when I was editing my script code and re-running. With 
processors I point the script file and in order to have it reloaded in case of 
changes I only need to stop/start the processor. 

For scripted services I assumed that disabling/enabling them would had the same 
effect but that was not the case. Once enabled I could only had it reloaded by 
stoping my docker container and starting it again. It took me a while before I 
could figured out that my new code was not being loaded and instead the 
previous versions still in use (even after disabling/enabling it).

- *Suggestion #5*: Improve UX consistency between processor configuration and 
controller services configuration. Allow service's code reload with 
enable/disable and add some link for how to properly log service messages.

- *Suggestion #6*: Add a section into developer guide explaining the default 
interfaces (like LookupServices) and how they should be implemented/extended

Anyway I'm really enjoying NiFi and day after day it's becoming easier to 
understand it's component's model.

Great work guys and thanks for such a great software!!


> Brief summary of a newbie journey into NiFi
> -------------------------------------------
>
>                 Key: NIFI-4569
>                 URL: https://issues.apache.org/jira/browse/NIFI-4569
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Docker
>    Affects Versions: 1.4.0
>            Reporter: Eric Chaves
>            Priority: Minor
>
> Hi folks,
> As requested at Nifi's User mailing list, I'm compiling a small briefing of 
> my experience as a new user working with NiFi. This report includes some 
> feedback while I was learning NiFi following some tutorial and articles found 
> on the internet.
> In particular, after scouting the internet and reading NiFi's docs, I first 
> tried to write some custom flows using scripted processors and scripted 
> services.
> Most of the troubles I've faced I attribute to my lack of expertise with 
> Java/Groovy plus my small knowledge with NiFi's architecture. I did faced 
> some inconsistencies however that may be eased for newcomers like me in 
> future releases.
> *Development workflow*
> To write my flows I'm using the official docker image (apache/nifi:1.4.0) 
> together with other images like mysql, mongodb and localstack (to simulate 
> AWS services).
> It took me a while to properly setup a working folder using docker. When I 
> simple run the container without any volume binded to my local host, NiFi 
> started properly because all the required files are already present in conf 
> dir. This setup works fine for trying NiFi out but is not good as development 
> starting point because changes may be lost due to container rebuilt.
> To address this I first attempted to bind volumes for ./conf and ./logs 
> folder but this didn't worked as expected because this produced and empty 
> ./conf folder that prevents nifi from starting. To get this right was a 
> little hard and I only did it after looking at some github's projects to 
> compare they're ./conf dir with NiFi's default configuration inside the 
> docker image. 
> _- *Suggestion #1*: NiFi shoud created default configuration files when no 
> file is present in ./conf_
> After getting my docker right I started creating some flows following some 
> Hortonworks tutorials and others website (like Matt's blog) and the first 
> thing that confused me was the lack of a simple to get a new blank canvas. In 
> order to do so I need to stop all components, delete processors and groups, 
> ports and etc.
> _- *Suggestion #2*: Add an "new canvas" button to the operations pallete (or 
> somewhere else)._
> At this point I started writing some flows that I would like to keep under 
> version control and there is no simple way for that. NiFi's canvas is kept in 
> a gzip file which is not ideal for version control and the current way of 
> exporting a canvas requires us to save it as template and then export the 
> template. This process seems ok when we have finished a flow and want to 
> export it for future use as an actual template but is not good for small 
> commits while working in the flow (WIP commits). A better approach would be 
> to have the flowfile as xml directly under source control.
> _- *Suggestion #3*: Enable a configuration mode to kept the current canva as 
> pure xml instead of  GZipped file._
> *Writing Custom Processors*
> Not much to be said here. The experience was awesome. The developers guide is 
> very good and Matt's 3 post 'nifi scripting cookbooks' are priceless. Those 
> cookbooks should be added to the default documentation.
> Two things hit me here: I assumed that all JAR used by NiFi's default 
> processors were available for use by a script processor (requiring it only to 
> be imported) but that is not the case and I needed to re-add some JAR's (like 
> javax.mail or AWS java sdk) to the script's modules folder. That was not 
> intuitive for me but maybe because I'm not a Java developer. I also had a 
> trouble with java loading proper handlers for mime types (something related 
> to javax.mail) that blocked me to use Groovy for a custom script write 
> multipart mime messages (I ended writing it in python).
> - *Suggestion #4*: Make clear which JAR files are available by default inside 
> a  and which are not and how to properly configure the system class loader.
> *Writing Custom Services*
> I'm working on a custom flow that needs to enrich some data records. Since I 
> had successfully wrote some processors scripts I tried to script a custom 
> LookupService and after googling around I used two sources as reference: 1) 
> An Andy Lopresto script found at gist and 2) the test_lookup_inline.groovy at 
> NiFi's source code.
> I made some mistakes and my script lookup was not working so I decided to log 
> some info in order to troubleshoot my code but no information was logged. 
> That's when I noticed that my docker image was not producing any logs. I read 
> the administrator guide to see If I was missing something at bootstrap.conf 
> file but my file was ok according to the docs.
> It was only when I scouted the logback.xml file that I noticed an variable 
> "bootstrap.conf.dir" somewhere and that hinted me to try adding a "log.dir=" 
> key to the bootstrap.conf file. At this time I also found that I had the 
> wrong permissions on my local log folder which was probably the reason why no 
> logs were being written. Since I made both changes at once I can't say for 
> sure which one fixed the logs.
> Only once I got the nifi-user.log properly working that I could see the 
> exception was being raised (because there wasn't a log object) and that was 
> the whole error. Once fixed the scripted lookup worked like a charm.
> Matt explained in the user's list the reasons why there is no log by default 
> on controller services and it makes sense however as a user I got a little 
> lost because when configuring processors I can set the log level very easily 
> but when comes to ControllerServices the dialog has no mention about how or 
> where log is done. 
> The same thing happens when I was editing my script code and re-running. With 
> processors I point the script file and in order to have it reloaded in case 
> of changes I only need to stop/start the processor. 
> For scripted services I assumed that disabling/enabling them would had the 
> same effect but that was not the case. Once enabled I could only had it 
> reloaded by stoping my docker container and starting it again. It took me a 
> while before I could figured out that my new code was not being loaded and 
> instead the previous versions still in use (even after disabling/enabling it).
> - *Suggestion #5*: Improve UX consistency between processor configuration and 
> controller services configuration. Allow service's code reload with 
> enable/disable and add some link for how to properly log service messages. 
> Expose a log object on scripted components.
> - *Suggestion #6*: Add a section into developer guide explaining the default 
> interfaces (like LookupServices) and how they should be implemented/extended
> Anyway I'm really enjoying NiFi and day after day it's becoming easier to 
> understand it's component's model.
> Great work guys and thanks for such a great software!!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (NIFI-4569) Brief summary of a newbie journey into NiFi

Reply via email to