Hi Tony,

Not exactly, the primary target audience for this architecture view would be 
for business executives and system architects.  I'm targeting folks who are 
looking for an enterprise view or who are seeking to understand and come up to 
speed on how the Core Framework works.

And the docs that I have seen and reviewed to date don't seem to fit that 
particular target audience. 

Also, what's the process to propose a design idea for the Core Framework?  In 
reviewing some of the source code, I didn't see any software packages that 
supported metrics needs.

I'd like to propose an addition or enhancement be made to the Core to support 
volume management, trend analysis by way of databasing attributes and content 
so that it is query-able and made available for display. This information would 
then be used for statistical roll ups, metrics, trend analysis, etc..

Ideally, we'd do it by capturing running totals by receiving copies of local 
provenance events.  This component would be like local provenance in that it 
would retain the data for some configurable period of time, based on the amount 
of disk space allocated for that process.  In addition, these roll ups could be 
sent somewhere for even longer retention.

The goal is to keep as many hooks as possible to making it possible for other 
programs/services to ingest both the local provenance logs, and the rolled up 
summaries.  There's a growing base of people who are comfortable with NIFI 
graphs, and local provenance, so I think that it makes sense to build off that.

The issue I'm facing is that Provenance is fine for tracking one file if you 
have a starting point, but it is not designed to do counting, summarization and 
correlation of data. And it doesn't support advanced queries.   

Here are some of the most immediate and pressing use cases for this design.

1.  How much traffic came in yesterday (or last week)?
2. Provide statistical counts on items of interest within a flow for a given 
flow/date range.
3.  When was the last file sent to "System X"?
4. Did anything get sent to "System Y"?
5. How much data was marked with a certain tag?
6. How much data was scanned?
7. How much data was detected?
8. How much of a particular type of data was received in bytes?
9. How much data was processed by file count? 

Another thought:

This might also be a good place for hooking streaming services where you can 
deal with the raw events and then summarize/aggregate when things go by.

I'm completely new to this process so I don't know if basic concept proposals 
of this sort should come in the form of an architecture diagram or simply in 
plain English. 

Thanks, 

Teresa Jackson
Onyx Consulting Services, LLC
Chief Engineer

________________________________________
From: Tony Kurc <[email protected]>
Sent: Monday, January 12, 2015 10:28 PM
To: [email protected]
Subject: Re: Core Components

Teresa,
Glad you're interested in contributing. I suggest reading some of the
guides apache has [1] on what to expect when getting involved, which should
answer some of the questions about vetting and the board (which I inferred
to mean the PPMC)

Were you planning on doing this for developer documentation to get
developers up to speed more quickly [2]? Thus far the documentation has
been developed with asciidoc [3], I certainly had some degree of
expectation the developer guide to have followed this path also. Were you
expecting to build images from the UML or other tool to include in a guide?
Or were you thinking it may be useful to have UML outside the context of a
developer documentation guide?

[1] http://www.apache.org/foundation/getinvolved.html
[2] https://issues.apache.org/jira/browse/NIFI-152
[3]
http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/User-Guide-td46.html

Tony

On Mon, Jan 12, 2015 at 8:05 PM, Teresa Jackson <[email protected]>
wrote:

> Hello everyone,
>
> I'm reviewing the Apache-NiFi source code and would like to put together
> some architecture diagrams of the framework's core components. What's the
> required format for submission (UML, DODAF 2.0, et. al)? Also, what's the
> vetting process? And are there tools/approaches/processes that the board
> prefer be used?
>
> Thanks,
>
> Teresa Jackson
> Onyx Consulting Services, LLC
> Chief Engineer
>

Reply via email to