Thanks for that context and perspective, Kevin. Good luck on your project.
Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com He/Him PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Apr 13, 2020, at 6:07 AM, Kevin Telford <kevin.telf...@gmail.com> wrote: > > Hi all - thank you for all the thoughtful feedback. > > Regarding my original question, I think the patterns Mike outlined would be > good enough. > That said, we're not going to move forward using NiFi for the project, and I > figured I'd take a step back to explain where we were coming from, as some > may find the perspective useful. Or not :) > > > We have a project that needs some data transformation. Input is excel, output > multiple CSVs or POSTs of data to an API. On the surface, simple enough. > > Our input Excel can and will change a lot, so we'll need rapid iterations, > and testing. > > The project architecture is container-based, currently consisting of a front > end docker image, a back end image, and database image. ETL is intended to be > a fourth. It can be orchestrated with Docker Compose, K8, or bare metal. The > goal is to be turn key and low friction. > > There were two reasons we didn't choose NiFi - the painful (read: long) Java > deployment lifecycle for custom processing, and system complexity, > particularly around updating new flows. > > Regarding the pain of Java, I've partied with Java since 1.4, so I get it. > But these days, if I have a data analyst/data engineer with lowish > programming skills, I can't have them compiling and moving around jars, nor > do I want to invest in building out the build/deploy pipeline. Platforms have > really evolved (especially look at the cloud native tools), and code can be > written "in line" in the UI, and just deployed. A lot of this is due to > dynamic languages (e.g. Python), but it can still be done with Java with > behind the scenes compilation. Juypter Notebook, for it's many, many faults, > is the way things are heading, and the kids love it. > > I touched a lot on updating flows above, but in NiFi my choices seemed to be > to replace the Flow.xml.gz file, or use the NiFi Registry. My concern with > the registry was that it was yet another moving part, and even still I'd have > to build in source control workflows. Here again, newer platforms have all > this baked in. > > > In closing, I think there is definitely still a place for NiFi, especially on > the enterprise side where stability, scale and management are paramount. But > I did want to share this, as these non-enterprise use cases I am describing > will, over time become the enterprise use cases, and the NiFi project would > do well to evaluate their long term strategy. > > Thanks again for all the responses. > Best, > Kevin > > On 2020/04/08 14:27:54, Kevin Telford <kevin.telf...@gmail.com> wrote: >> Hi all – I have a two part question. >> >> >> >> I’d like to run NiFi inside a container in order to deploy to various >> environments. As far as I can tell, the flow.xml.gz file is the main >> “source” if you will, for a NiFi data flow. >> >> Q1) Is the flow.xml.gz file the “source” of a NiFi data flow, and if so, is >> it best practice to copy it to a new env in order to “deploy” a prebuilt >> flow? Or how best is this handled? >> >> >> >> Given that Q1 is true, my challenge then becomes somewhat Docker-specific… >> >> Situation: >> >> - In the Dockerfile we unzip the NiFi source (L62 >> >> <https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/Dockerfile#L62>) >> and then create Docker volumes (L75 >> >> <https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/Dockerfile#L75> >> specifically for the conf dir). Once the container starts all the normal >> NiFi startup things happen, and /opt/nifi/nifi-current/conf/flow.xml.gz >> created. >> >> Complication: >> >> - In order to persist flow.xml.gz outside of the container, I would >> normally mount the /opt/nifi/nifi-current/conf directory, however in this >> case I cannot mount it on initialization because that will overwrite conf >> config files with whatever directory I bind it to (Docker container >> isolation ensures host -> container file precedence). >> - I could mount to a running container, but this is less ideal due to >> the various ways a container can be deployed. >> - I could copy manually from the running container, but this is less >> ideal as it’s on demand, and not always persisting latest. >> >> Resolution: >> >> - I believe instead, we would ideally create a few flow config specific >> env vars and use them to update our nifi.properties (via >> >> https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/sh/start.sh), >> i.e. NIFI_FLOW_CONFIG_FILE_LOCATION, NIFI_FLOW_CONFIG_ARCHIVE_ENABLED, >> NIFI_FLOW_CONFIG_ARCHIVE_DIR and so on for all nifi.flow.configuration >> props. >> >> Q2) Would the above proposal be ideal? (add a few env vars to start.sh) – >> if so, happy to add a PR for the code and doc change. Or have others solved >> this a different way? >> >> >> >> Best, >> >> Kevin >>