Re: MiNiFi C++ Data Provenance and Related Issues

Randy Gelhausen Tue, 29 Nov 2016 00:16:02 -0800

I have no comments as to LevelDB's fit for the requirements Daniel
described. However, the current build process & binary artifact produced
are problematic in two ways:


1. OSX has proven particularly stubborn about putting the leveldb headers
in a place cmake can find them
2. Running the minifi-cpp binary requires RHEL environments to pre-install
leveldb-devel (and pre-req epel-release), stumbling blocks for enterprise
deployment.

+1 for shipping provenance inside FlowFiles. As a consumer of FlowFiles
generated by nifi-minifi-cpp agents, I need the context of every FlowFile's
provenance to route it properly. The existing solutions of
provenance-query-per-flowfile or separately exporting provenance and
joining via UUID downstream are painful.

The idea arises that minifi-cpp might not need mimic its Java cousins'
separate repository stores, particularly if S2S could be optimized to avoid
re-transmission of static or slowly changing flow metadata.

On 2016-11-28 09:25 (-0500), Daniel Cave <d...@ssglimited.com> wrote:
> This is a break off from the discussion on the MiNiFi C++ 0.1.0 Release>
> thread.  I assume a hub and spoke NiFi/MiNiFi C++ architecture.>
>
> As discussed on that thread, I am concerned about the existing choice
for>
> data provenance tracking and the implications it leads to as well as the>
> current data provenance requirements for MiNiFi C++.  MiNiFi C++ must be>
> highly efficient and carry a minimal footprint in order to be able to>
> function at background and embedded levels.  As such, performance and
space>
> are priorities as are the ability to communicate to the NiFi hub the
needed>
> information (i.e. there isn't space for a large unindexed data
provenance>
> archive locally nor the processing ability to handle it).>
>
> The data provenance registry must be:  1) Fault tolerant, 2) able to be>
> easily purged, 3) fast to write, 4) easily accessed in session, 5)
easily>
> accessed post session.  The current choice (LevelDB) meets #3, but not
the>
> other 4 requirements.  LevelDB is prone to corruption in cases of>
> application failure during a write (fails #1).  LevelDB has no indexing,
and>
> if keys are by UUID then there is no way to efficiently sort by date or
by>
> parent/child (fails #2, #4, #5).  The choice for a provenance store
should>
> answer as many of these as possible.  For permanent stores, the choices>
> would be super lightweight databases or something fault resistent like
LMDB. >
> I don't have any preference, just that it functionally addresses as many>
> criteria as possible and absolutely satisfies #1.>
>
> A solution to #4 and #5 could be that the entire provenance tree inside>
> MiNiFi C++ rides with the flowfile and transfers to NiFi (including
through>
> descendants).  This I see as something of a requirement as well, as it
is>
> the only efficient way to provide cradle to grave provenance through the>
> entire MiNiFi/NiFi system without the need for heavy post processing to>
> reconstruct the tree.  While this adds slightly to the package being
sent>
> between MiNiFi and NiFi, it's negligible compared to post query this>
> especially where MiNiFi is embedded or on an IoT device.>
>
> Any thoughts?>
>
>
>
> -->
> View this message in context:
http://apache-nifi-developer-list.39713.n7.nabble.com/MiNiFi-C-Data-Provenance-and-Related-Issues-tp14024.html>

> Sent from the Apache NiFi Developer List mailing list archive at
Nabble.com.>
>

Re: MiNiFi C++ Data Provenance and Related Issues

Reply via email to