David Good and important thread for NiFi. Some quick thoughts on this more generally. Perhaps we ...
0. Have a repo for the 'nifi-api'. Any true extension should be against a specific version of the nifi api only. This should change far less frequently than the other bits. Probably should review that this is true from history. We probably also have to include things that are 'effectively' part of the true API we need to honor which includes expression language, records, and hopefully relatively few other things. 1. Have the mono repo like we do now for the nifi framework, nifi application, minifi, registry and such. And notably the nifi-assembly remains here. 2. Have a repo for 'nifi-java-extensions' and we move everything there which is meant as an extension component against the nifi-api specifically (not the framework things like provenance/etc. - frankly anything that is related to the 'nifi-framework-api' is NOT a true extension and doesn't belong in this 'nifi-java-extension' repo) 3. Have a repo for 'nifi-python-extensions' We can release 0 whenever we need to improve the nifi-api and then 1-3 at the same time like we do today. Or we can when needed release them independently like fixing specific bugs or vulnerabilities/etc.. Worth really digging in and understanding what this could/should look like to give ourselves cleaner air going forward. Thanks Joe On Thu, Feb 1, 2024 at 1:03 PM Gábor Gyimesi <[email protected]> wrote: > Hi David, > > MiNiFi C++ had had a Python API before, using Python's stable C API, but > the processors had a different, simpler format like this following example: > > https://github.com/apache/nifi-minifi-cpp/blob/main/extensions/python/pythonprocessors/examples/GaussianDistributionWithNumpy.py > > Our goal was to be able to support NiFi's Python API, with the ease of only > copying the Python processor file to MiNiFi C++'s configured python > processor path, and use them in the flow config the same way as the > original MiNiFi C++ style python processors. > > There is already an open PR for supporting this for NiFi's > FlowFileTransform processor types (as of now MiNiFi C++ does not support > record based flow file processing): > https://github.com/apache/nifi-minifi-cpp/pull/1712 and also an open PR > for > supporting virtual environments: > https://github.com/apache/nifi-minifi-cpp/pull/1721 as previously MiNiFi > C++ only supported system installed Python packages. The implementation > uses the same C API bindings as before, importing NiFi's nifiapi adapted to > MiNiFi C++'s python API to be able to use NiFi's Python processors. > > There are still a few limitations due to the differences between NiFi and > MiNiFi C++ implementations which are listed here: > > https://github.com/apache/nifi-minifi-cpp/blob/d27430260c8c35dac52011bdb31b22b36e10539d/extensions/python/PYTHON.md > Some of these limitations are being addressed by these jira tickets in this > epic: https://issues.apache.org/jira/browse/MINIFICPP-2272 > > I tested all the available Python processors (aside from RecordTransform > processors) of NiFi 2.0.0-M2 and they seem to be working with MiNiFi C++ > with these PRs, so it looks promising. > > Regards, > Gabor Gyimesi > > > > On Thu, 1 Feb 2024 at 19:03, David Handermann <[email protected] > > > wrote: > > > Hi Gabor, > > > > Thanks for the reply. > > > > It is helpful to know about the progress of Python Processor support > > in MiNiFi C++. Is the goal to support the same NiFi Python API as > > implemented for NiFi itself? > > > > The goal of a separate repository for Python extensions would be to > > keep it self-contained for testing and releasing. From that > > perspective, it would have a dependency on a declared version of the > > NiFi Python API, and would include automated build workflows for > > testing. > > > > For the NiFi framework components, there would still need to be > > internal components that support testing implementations of Python > > APIs, but the Python Extensions repository would have its own > > decoupled set of tests. > > > > Regards, > > David Handermann > > > > On Thu, Feb 1, 2024 at 11:05 AM Gábor Gyimesi <[email protected]> > wrote: > > > > > > Hi David, > > > > > > Currently we are in the process of implementing support for the NiFi > > python > > > processors in MiNiFi C++. Probably in the next open source release this > > > feature will be available, so the available NiFi Python processors will > > be > > > usable in MiNiFi C++ as well. I think this idea would help with the > > > collaboration of supporting these processors in both Java and C++ > > projects. > > > It would certainly make release verification easier to be able to > > > concentrate only on the Python processors if they are released > > separately. > > > > > > My concern is how would the automatic testing and verification would > work > > > in this scenario? Would all the testing of the Python processors be > moved > > > to the new repository and would be tested there separately, with both > > NiFi > > > and MiNiFi C++, or only with NiFi, or all of the testing would remain > in > > > the respective client repositories? > > > > > > Regards, > > > Gabor Gyimesi > > > > > > On Fri, 26 Jan 2024 at 14:11, David Handermann < > > [email protected]> > > > wrote: > > > > > > > Pierre, > > > > > > > > Thanks for the reply, and noting the potential concern with the > > > > ability to find these components. > > > > > > > > I think there are several ways we can address this concern, both for > > > > optional Java components, and for Python components in a separate > > > > repository. > > > > > > > > For Python components in particular, we could add direct links to > > > > published versions on the main download page, calling out their > > > > availability in the official PyPI repository. Although this would > need > > > > to be denoted as a non-official release channel for Apache purposes, > > > > this is common practice in other projects, and follows the approach > we > > > > already have for container images on Docker Hub. > > > > > > > > In addition to linking from the download page, we could publish the > > > > generated documentation for these components. The current process for > > > > publishing generated documentation is based on the convenience > binary, > > > > but with some adjustments, we could publish the documentation for the > > > > Python components as well. This is a good prompt to start doing this > > > > for the optional Java components, and I plan to look at doing this > for > > > > the next release with optional Java components. > > > > > > > > To your last question, it is worth noting that any binary releases > > > > would fall into the category of convenience builds. Initially, I > think > > > > the NiFi framework release would not include the Python extension > > > > components. However, having a few short steps on installation, linked > > > > from the download page, seems like it would provide a way forward. > > > > > > > > Regards, > > > > David Handermann > > > > > > > > On Fri, Jan 26, 2024 at 1:22 AM Pierre Villard > > > > <[email protected]> wrote: > > > > > > > > > > Hi David, > > > > > > > > > > While I agree with your summary, I have a concern here which is > about > > > > user > > > > > awareness of this feature. We've seen in the past: as soon as we > > don't > > > > > include NARs in the convenience binary, we see that users have no > > clue > > > > > about those NARs (and some are super powerful/useful). I agree that > > > > python > > > > > is a bit different because it requires a user action to enable it > in > > the > > > > > first place but I still think that including the components in the > > > > > convenience binary of Apache NiFi would drive user awareness, > > adoption, > > > > etc. > > > > > > > > > > If we have a separated repo with its own release cycle can we > > imagine a > > > > > process where, when releasing Apache NiFi, it'd include whatever is > > the > > > > > latest version of the Python repo? Or something along those lines? > > > > > > > > > > Pierre > > > > > > > > > > Le ven. 26 janv. 2024 à 08:01, David Handermann < > > > > [email protected]> > > > > > a écrit : > > > > > > > > > > > Team, > > > > > > > > > > > > As we get closer to a full release of Apache NiFi 2.0.0, we have > an > > > > > > important opportunity to set the direction for future development > > of > > > > > > Python-based Processors. > > > > > > > > > > > > The introduction of native Python support presents a number of > new > > > > > > integration opportunities, and it also raises questions about > > > > > > maintenance and versioning. As the journey to NiFi 2.0.0 has > > shown, it > > > > > > requires significant effort to coordinate maintenance and > > > > > > modernization across hundreds of project modules. Although the > > > > > > internal project structure has maintained helpful separation of > API > > > > > > and implementation, the current release strategy highlights the > > > > > > challenges of verifying multiple layers of changes. Introducing a > > new > > > > > > programming language provides greater possibilities, but also > > makes it > > > > > > more difficult to maintain a single repository with a single > > > > > > versioning strategy. > > > > > > > > > > > > I propose creating a new Git repository named > > nifi-python-extensions, > > > > > > which would have its own versioning and release process. This > would > > > > > > contain the extensions now under the module of the same name in > the > > > > > > NiFi repository. Having a separate repository and release process > > for > > > > > > Python-based extensions has the following advantages: > > > > > > > > > > > > 1. Clean separation between NiFi APIs for Python and Python-based > > > > > > Processors > > > > > > 2. Independent release cycles for Python-based Processors > > > > > > 3. Focused release verification and testing on Python-based > modules > > > > > > > > > > > > These advantages can also enable more rapid iteration on > > Python-based > > > > > > Processors, without impacting the NiFi Framework or requiring new > > > > > > releases at that level. Although this would require a separate > > > > > > installation process for Python-based components, this could > > follow an > > > > > > approach similar to what is already required for optional > > Java-based > > > > > > components. > > > > > > > > > > > > Thanks in advance for your consideration. > > > > > > > > > > > > Regards, > > > > > > David Handermann > > > > > > Apache NiFi PMC Member > > > > > > > > > > > > >
