Re: MiNiFi C++ Data Provenance and Related Issues

Daniel Cave Tue, 29 Nov 2016 09:06:13 -0800

"Yes but there can be other hubs too and in parallel."
[Daniel]For MiNiFi C++ -> SystemA -> SystemB -> ... -> NiFi, if you dont
want provenance to travel then I don't see it as an issue since the outgoing
message would be identical to what you have now.  If you feel it's going to
be extremely confusing then I could make it a new clone of the S2S MiNiFi
C++ processor, but I don't see a point to just hide a toggle.  On the NiFi
side for this case you would use the normal S2S intake methods you use now. 
No change.  Also, if you're going from MiNiFi C++ -> SystemA there is no
change.
For MiNiFi C++ -> MiNiFi C++ ->....-> NiFi, if you want provenance travel
then yes you are locked into using n*(MiNiFi C++) -> NiFi with the
provenance toggled on and using the new S2S receiving processors in MiNiFi
C++/NiFi (it has to be a new one to avoid backwards compatibility issues)
that can handle provenance.  Again, I don't see this as an issue either
since you are clearly wanting this functionality if you're doing this.
Am I missing something in my logic flow that you are seeing that I need to
account for?

"You've mentioned this a couple times now. "
[Daniel] Agreed and this is how this discussion is meant to be taken.

"I'm not quite sure I understand so please elaborate if my
comments don't apply."
[Daniel]It has to do with when and how it's consumed. On current path Atlas
won't answer the issues, but as you said there are others and I have my own
in progress as well. I fundamentally disagree with the current
sink-retrieve-sink ETL paradigm (as you've seen from my public papers, there
are others not public yet as well) as it is a complete waste of time and
resources at this point. In all my work, data is handled as available (near
real-time) rather than waiting for some ETL processes to run at some
arbitrary point in the future. By doing this you avoid unnecessary traffic,
storage, processing, maintenance, and design all while improving data
availability. More specifically to this discussion, the issue comes down to
access from the point of origin. In an embedded or background instance of
MiNiFi C++, bidirectional followup calls for provenance only are not always
going to be available. Additionally, where they are available they are not
going to be current and hence are fairly useless for security applications.
Think of trying this on your laptop, IoT devices, or on financial
transactions. If I find out 12-36hrs later when you reconnect or I can send
someone to the field to retrieve it or the ETL processes run that there was
an issue, it doesn't do me any good. As Randy mentioned, you can recombine
all this later, however it is a very resource consuming process. There is
no reason not to have it available when the data is available since it's
just a matter of allowing for its transfer in line with the data. NiFi is
not assuming responsibility for anything it doesn't already, this just
extends it's reach to the full NiFi/MiNiFi instance so there should not be
an ownership concern. This requires an extremely minor update in NiFi, but
is for a fundamental need in MiNiFi C++.

"Ok so I think what you're saying is"
[Daniel] Right, and since you can just disable it if you don't need it there
is no performance or bandwidth hit unless you enable it.

"It is really important to propose and advocate"
[Daniel] I don't see this as a model change, as per my previous questions
MiNiFi C++ seems to not yet have a solid model as the time and effort is
being mainly being put into MiNiFi Java. Since I have very specific ideas
around MiNiFi C++ (and have discussed them with you last year and others at
HW when MiNiFi was only going to be in C) I have not seen this as a radical
departure but an elaboration on what we had already discussed. If you or
the community wants to go a different path, I have no issue branching and
going a separate way with these and the LevelDB changes rather than
introducing these changes into the current path. Being OpenSource there is
no right answer, so I'm certainly open to any suggestions, but I think
you'll find what I'm proposing here is going to be important when you get to
actual implementations of it and it's easier to change now than when you're
locked in later, especially given my issues getting our contributions into
NiFi. As stated above, I don't see how this affects any other
implementations or use cases of MiNiFi C++/NiFi as proposed.

--
View this message in context:
http://apache-nifi-developer-list.39713.n7.nabble.com/MiNiFi-C-Data-Provenance-and-Related-Issues-tp14024p14048.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: MiNiFi C++ Data Provenance and Related Issues

Reply via email to