As to Joe and Aldrin's concerns, I feel a bit more detail of what I had in
mind might clear up some of the concerns and vagaries (all valid) that you
mentioned.

As Aldrin mentioned, to me provenance is not about metadata needed for
routing.  I don't doubt there are use cases for that, as Randy mentioned,
however it was not the concern I had in mind that I am looking to address
with this discussion.  If the community wants to add more functionality from
a metadata also, we can certainly add that.

As for Joe's examples and concerns for in-band, I look at MiNiFi C++ as a
direct spoke of a NiFi hub and as such it really can be treated as one
"NiFi" instance.  Additionally, since MiNiFi C++ is a complete rewrite, as
has been previously discussed, making requirement variations from NiFi or
MiNiFi Java is acceptable, in my opinion.  As such, there is no value in
having separate provenance for MiNiFi C++ and NiFi since it is one cradle to
grave path (that happens to use both).  As for bandwidth concerns, this is
actually exactly one of the issues that concerns me as later calling to the
MiNiFi C++ enabled device merely to sort and retrieve provenance (which
would be a heavy operation as currently constructed) is not realistic.  One
of the biggest selling points of NiFi is its full data provenance ability,
and my goal is merely to extend it through the full "flow".  I personally
don't see this as an attribute as currently represented in the flowfiles
since that would not be an efficient structure to handle or maintain through
MiNiFi C++ pathing.  This requires the provenance tree related to that
flowfile to be sent (which should be small-ish in a MiNiFi C++ instance). 
My design for it was that it would be a separate data point on the flowfile
package using a simple, extremely lightweight, and easy to manipulate
structure.  Truthfully, it doesn't even have to be resident all through the
MiNiFi C++ flow if a viable repo replaces LevelDB and my preference is to
add it in at the S2S processor.  The important thing is that it can be sent
with the flowfile through S2S and then added to the main NiFi provenance
repo so as to provide a continuous chain.  This would be easy to toggle
through a single checkbox added to a MiNiFi C++ S2S variant so that if you
choose not to integrate as provenance isn't important to you, you could. 
Since in this model, MiNiFi C++ plus provenance only integrates with NiFi
hubs, there is no reason to concern with outside compatibility for this
specific S2S processor mechanism.

I see the ability to allow for "in-band" communication at the S2S-S2S point
as a requirement for some use cases.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/MiNiFi-C-Data-Provenance-and-Related-Issues-tp14024p14045.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to