"Yes but there can be other hubs too and in parallel." [Daniel]For MiNiFi C++ -> SystemA -> SystemB -> ... -> NiFi, if you dont want provenance to travel then I don't see it as an issue since the outgoing message would be identical to what you have now. If you feel it's going to be extremely confusing then I could make it a new clone of the S2S MiNiFi C++ processor, but I don't see a point to just hide a toggle. On the NiFi side for this case you would use the normal S2S intake methods you use now. No change. Also, if you're going from MiNiFi C++ -> SystemA there is no change. For MiNiFi C++ -> MiNiFi C++ ->....-> NiFi, if you want provenance travel then yes you are locked into using n*(MiNiFi C++) -> NiFi with the provenance toggled on and using the new S2S receiving processors in MiNiFi C++/NiFi (it has to be a new one to avoid backwards compatibility issues) that can handle provenance. Again, I don't see this as an issue either since you are clearly wanting this functionality if you're doing this. Am I missing something in my logic flow that you are seeing that I need to account for?
"You've mentioned this a couple times now. " [Daniel] Agreed and this is how this discussion is meant to be taken. "I'm not quite sure I understand so please elaborate if my comments don't apply." [Daniel]It has to do with when and how it's consumed. On current path Atlas won't answer the issues, but as you said there are others and I have my own in progress as well. I fundamentally disagree with the current sink-retrieve-sink ETL paradigm (as you've seen from my public papers, there are others not public yet as well) as it is a complete waste of time and resources at this point. In all my work, data is handled as available (near real-time) rather than waiting for some ETL processes to run at some arbitrary point in the future. By doing this you avoid unnecessary traffic, storage, processing, maintenance, and design all while improving data availability. More specifically to this discussion, the issue comes down to access from the point of origin. In an embedded or background instance of MiNiFi C++, bidirectional followup calls for provenance only are not always going to be available. Additionally, where they are available they are not going to be current and hence are fairly useless for security applications. Think of trying this on your laptop, IoT devices, or on financial transactions. If I find out 12-36hrs later when you reconnect or I can send someone to the field to retrieve it or the ETL processes run that there was an issue, it doesn't do me any good. As Randy mentioned, you can recombine all this later, however it is a very resource consuming process. There is no reason not to have it available when the data is available since it's just a matter of allowing for its transfer in line with the data. NiFi is not assuming responsibility for anything it doesn't already, this just extends it's reach to the full NiFi/MiNiFi instance so there should not be an ownership concern. This requires an extremely minor update in NiFi, but is for a fundamental need in MiNiFi C++. "Ok so I think what you're saying is" [Daniel] Right, and since you can just disable it if you don't need it there is no performance or bandwidth hit unless you enable it. "It is really important to propose and advocate" [Daniel] I don't see this as a model change, as per my previous questions MiNiFi C++ seems to not yet have a solid model as the time and effort is being mainly being put into MiNiFi Java. Since I have very specific ideas around MiNiFi C++ (and have discussed them with you last year and others at HW when MiNiFi was only going to be in C) I have not seen this as a radical departure but an elaboration on what we had already discussed. If you or the community wants to go a different path, I have no issue branching and going a separate way with these and the LevelDB changes rather than introducing these changes into the current path. Being OpenSource there is no right answer, so I'm certainly open to any suggestions, but I think you'll find what I'm proposing here is going to be important when you get to actual implementations of it and it's easier to change now than when you're locked in later, especially given my issues getting our contributions into NiFi. As stated above, I don't see how this affects any other implementations or use cases of MiNiFi C++/NiFi as proposed. -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/MiNiFi-C-Data-Provenance-and-Related-Issues-tp14024p14048.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.