Author: buildbot
Date: Wed Jan 21 15:33:42 2015
New Revision: 937118
Log:
Staging update by buildbot for nifi
Modified:
websites/staging/nifi/trunk/content/ (props changed)
websites/staging/nifi/trunk/content/docs/nifi-docs/user-guide.html
Propchange: websites/staging/nifi/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Jan 21 15:33:42 2015
@@ -1 +1 @@
-1653540
+1653560
Modified: websites/staging/nifi/trunk/content/docs/nifi-docs/user-guide.html
==============================================================================
--- websites/staging/nifi/trunk/content/docs/nifi-docs/user-guide.html
(original)
+++ websites/staging/nifi/trunk/content/docs/nifi-docs/user-guide.html Wed Jan
21 15:33:42 2015
@@ -22,7 +22,7 @@ limitations under the License.
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 1.5.2">
<meta name="author" content="Apache NiFi Team">
-<title>NiFi User Guide</title>
+<title>Apache NiFi User Guide</title>
<link rel="stylesheet"
href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400">
<style>
/* Asciidoctor default stylesheet | MIT License | http://asciidoctor.org */
@@ -426,20 +426,10 @@ body.book #toc,body.book #preamble,body.
.show-for-print{display:inherit!important}}
</style>
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.2.0/css/font-awesome.min.css">
-<script>
- (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
- (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
Date();a=s.createElement(o),
-
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
- })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
- ga('create', 'UA-57264262-1', 'auto');
- ga('send', 'pageview');
-
-</script>
</head>
<body class="article">
<div id="header">
-<h1>NiFi User Guide</h1>
+<h1>Apache NiFi User Guide</h1>
<div class="details">
<span id="author" class="author">Apache NiFi Team</span><br>
<span id="email" class="email"><a
href="mailto:[email protected]">[email protected]</a></span><br>
@@ -447,6 +437,7 @@ body.book #toc,body.book #preamble,body.
<div id="toc" class="toc">
<div id="toctitle">Table of Contents</div>
<ul class="sectlevel1">
+<li><a href="#introduction">Introduction</a></li>
<li><a href="#terminology">Terminology</a></li>
<li><a href="#User_Interface">NiFi User Interface</a></li>
<li><a href="#building-a-dataflow">Building a DataFlow</a>
@@ -456,6 +447,7 @@ body.book #toc,body.book #preamble,body.
<li><a href="#additional-help">Additional Help</a></li>
<li><a href="#connecting-components">Connecting Components</a></li>
<li><a href="#processor-validation">Processor Validation</a></li>
+<li><a href="#example-dataflow">Example Dataflow</a></li>
</ul>
</li>
<li><a href="#command-and-control-of-dataflow">Command and Control of
DataFlow</a>
@@ -466,13 +458,14 @@ body.book #toc,body.book #preamble,body.
<li><a href="#Remote_Group_Transmission">Remote Process Group
Transmission</a></li>
</ul>
</li>
+<li><a href="#navigating">Navigating within a DataFlow</a></li>
<li><a href="#monitoring">Monitoring of DataFlow</a>
<ul class="sectlevel2">
<li><a href="#processor_anatomy">Anatomy of a Processor</a></li>
<li><a href="#process_group_anatomy">Anatomy of a Process Group</a></li>
<li><a href="#remote_group_anatomy">Anatomy of a Remote Process Group</a></li>
<li><a href="#Summary_Page">Summary Page</a></li>
-<li><a href="#Stats_History">Historical Statics of a Component</a></li>
+<li><a href="#Stats_History">Historical Statistics of a Component</a></li>
</ul>
</li>
<li><a href="#templates">Templates</a>
@@ -485,17 +478,30 @@ body.book #toc,body.book #preamble,body.
<li><a href="#data-provenance">Data Provenance</a>
<ul class="sectlevel2">
<li><a href="#searching-for-events">Searching for Events</a></li>
-<li><a href="#details-of-an-event">Details of an Event</a></li>
-<li><a href="#viewing-flowfile-content">Viewing FlowFile Content</a></li>
+<li><a href="#event_details">Details of an Event</a></li>
<li><a href="#replaying-a-flowfile">Replaying a FlowFile</a></li>
<li><a href="#viewing-flowfile-lineage">Viewing FlowFile Lineage</a></li>
</ul>
</li>
+<li><a href="#other_management_features">Other Management Features</a></li>
</ul>
</div>
</div>
<div id="content">
<div class="sect1">
+<h2 id="introduction"><a class="anchor"
href="#introduction"></a>Introduction</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Apache NiFi (Incubating) is a dataflow system based on the concepts of
flow-based programming. It supports
+powerful and scalable directed graphs of data routing, transformation, and
system mediation logic. NiFi has
+a web-based user interface for design, control, feedback, and monitoring of
dataflows. It is highly configurable
+along several dimensions of quality of service, such as loss-tolerant versus
guaranteed delivery, low latency versus
+high throughput, and priority-based queuing. NiFi provides fine-grained data
provenance for all data received, forked, joined
+cloned, modified, sent, and ultimately dropped upon reaching its configured
end-state.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
<h2 id="terminology"><a class="anchor" href="#terminology"></a>Terminology</h2>
<div class="sectionbody">
<div class="paragraph">
@@ -504,8 +510,8 @@ body.book #toc,body.book #preamble,body.
<div class="paragraph">
<p><strong>FlowFile</strong>: The FlowFile represents a single piece of data
in NiFi. A FlowFile is made up of two components:
FlowFile Attributes and FlowFile Content.
- Content is the data that is represented by the FlowFile. Attributes are
key-value pairs that provide information or
- context about the data.
+ Content is the data that is represented by the FlowFile. Attributes are
characteristics that provide information or
+ context about the data; they are made up of key-value pairs.
All FlowFiles have the following Standard Attributes:</p>
</div>
<div class="ulist">
@@ -558,16 +564,16 @@ body.book #toc,body.book #preamble,body.
</div>
<div class="paragraph">
<p><strong>Bulletin</strong>: The NiFi User Interface provides a significant
amount of monitoring and feedback about the current status of the application.
- In addition to rolling statistics and the current status that are
provided for each component, components are able to report Bulletins.
- Whenever a component reports a Bulletin, an icon is displayed on that
component (or on the Status bar near the top of the page, for System-Level
Bulletins).
- Using the mouse to hover over that icon will provide a tool-tip that
shows the time and severity (Debug, Info, Warning, Error) of the bulletin,
+ In addition to rolling statistics and the current status provided for
each component, components are able to report Bulletins.
+ Whenever a component reports a Bulletin, a bulletin icon is displayed
on that component. System-level bulletins are displayed on the Status bar near
the top of the page.
+ Using the mouse to hover over that icon will provide a tool-tip that
shows the time and severity (Debug, Info, Warning, Error) of the Bulletin,
as well as the message of the Bulletin.
- Bulletins from all components can also be viewed and filtered in the
Bulletins Page, available in the Management Toolbar.</p>
+ Bulletins from all components can also be viewed and filtered in the
Bulletin Board Page, available in the Management Toolbar.</p>
</div>
<div class="paragraph">
<p><strong>Template</strong>: Often times, a dataflow is comprised of many
sub-flows that could be reused. NiFi allows DataFlow Managers to select a part
of the dataflow
- (or the entire dataflow) and create a Template. This Template is given
a name can then be dragged onto the canvas just like the other components.
- As a result, several components be combined together to make a larger
building block from which to create a dataflow.
+ (or the entire dataflow) and create a Template. This Template is given
a name and can then be dragged onto the canvas just like the other components.
+ As a result, several components may be combined together to make a
larger building block from which to create a dataflow.
These templates can also be exported as XML and imported into another
NiFi instance, allowing these building blocks to be shared.</p>
</div>
</div>
@@ -577,15 +583,15 @@ body.book #toc,body.book #preamble,body.
<div class="sectionbody">
<div class="paragraph">
<p>The NiFi User Interface (UI) provides mechanisms for creating automated
dataflows, as well as visualizing,
-editing, monitoring, and administering those dataflows. The UI can be broken
down into several different segments,
-each responsible for different functionality of the application. We will begin
by looking at screenshots of the
-application and labeling the different segments of the UI. We will provide a
brief explanation of the purpose of each segment.
-Then, in the following sections of this document, we will discuss each of
those segments in greater detail.</p>
+editing, monitoring, and administering those dataflows. The UI can be broken
down into several segments,
+each responsible for different functionality of the application. This section
provides screenshots of the
+application and highlights the different segments of the UI. Each segment is
discussed in further detail later
+in the document.</p>
</div>
<div class="paragraph">
-<p>When the application is started, by default, the user is able to navigate
to the User Interface by going to
-<code>http://<hostname>:8080/nifi</code> in a web browser. There are no
permissions configured, by default, so anyone is
-able to view and modify the dataflow. For information on securing the system,
see Systems Administrator guide.</p>
+<p>When the application is started, the user is able to navigate to the User
Interface by going to the default address of
+<code>http://<hostname>:8080/nifi</code> in a web browser. There are no
permissions configured by default, so anyone is
+able to view and modify the dataflow. For information on securing the system,
see the Systems Administrator guide.</p>
</div>
<div class="paragraph">
<p>When a DataFlow Manager navigates to the UI for the first time, a blank
canvas is provided on which a dataflow can be built:</p>
@@ -601,13 +607,13 @@ To the left is the Components Toolbar. T
</div>
<div class="paragraph">
<p>Next to the Components Toolbar is the Actions Toolbar. This toolbar
consists of buttons to manipulate the existing
-components on the graph. Following the Actions Toolbar is the Search Toolbar.
This toolbar consists of a single
+components on the graph. To the right of the Actions Toolbar is the Search
Toolbar. This toolbar consists of a single
Search field that allows users to easily find components on the graph. Users
are able to search by component name,
-type, identifier, and configuration properties.</p>
+type, identifier, configuration properties, and their values.</p>
</div>
<div class="paragraph">
<p>The Management Toolbar sits to the right-hand side of the screen. This
toolbar consists of buttons that are
-of use to DataFlow Managers to manage the flow as well as administrators who
may use this section to manage user access
+used by DataFlow Managers to manage the flow as well as by administrators who
manage user access
and configure system properties, such as how many system resources should be
provided to the application.</p>
</div>
<div class="imageblock">
@@ -633,7 +639,7 @@ is a link that will take you back up to
each state (Stopped, Running, Invalid, Disabled), how many Remote Process
Groups exist on the graph in each state
(Transmitting, Not Transmitting), the number of threads that are currently
active in the flow, the amount of data that currently
exists in the flow, and the timestamp at which all of this information was
last refreshed. If there are any System-Level bulletins,
-these are shown in the Status bar as well. Additionally, if the instance of
NiFi is clustered, the Status bar shows many nodes
+these are shown in the Status bar as well. Additionally, if the instance of
NiFi is clustered, the Status bar shows how many nodes
are in the cluster and how many are currently connected.</p>
</div>
<div class="imageblock">
@@ -647,15 +653,14 @@ are in the cluster and how many are curr
<h2 id="building-a-dataflow"><a class="anchor"
href="#building-a-dataflow"></a>Building a DataFlow</h2>
<div class="sectionbody">
<div class="paragraph">
-<p>A DataFlow Manager (DFM) is able to build an automated dataflow using the
NiFi User Interface (UI). This is accomplished
-by dragging components from the toolbar to the canvas, configuring the
components to meet specific needs, and connecting
+<p>A DataFlow Manager (DFM) is able to build an automated dataflow using the
NiFi User Interface (UI). Simply drag components from the toolbar to the
canvas, configure the components to meet specific needs, and connect
the components together.</p>
</div>
<div class="sect2">
<h3 id="adding-components-to-the-canvas"><a class="anchor"
href="#adding-components-to-the-canvas"></a>Adding Components to the Canvas</h3>
<div class="paragraph">
-<p>In the User Interface section above, we outlined the different segments of
the UI and pointed out a Components Toolbar.
-Here, we will look at each of the Components in that toolbar:</p>
+<p>In the User Interface section above outlined the different segments of the
UI and pointed out a Components Toolbar.
+This section looks at each of the Components in that toolbar:</p>
</div>
<div class="imageblock">
<div class="content">
@@ -692,19 +697,22 @@ Processors that allow us to ingest data
location that it was dropped.</p>
</div>
<div class="paragraph">
+<p><strong>Note</strong>: For any component added to the graph, it is possible
to select it with the mouse and move it anywhere on the graph. Also, it is
possible to select multiple items at once by either holding down the Shift key
and selecting each item or by holding down the Shift key and dragging a
selection box around the desired components.</p>
+</div>
+<div class="paragraph">
<p><span class="image"><img src="./images/iconInputPort.png" alt="Input Port"
width="32"></span>
<strong>Input Port</strong>: Input Ports provide a mechanism for transferring
data into a Process Group. When an Input Port is dragged
onto the canvas, the DFM is prompted to name the Port. All Ports within a
Process Group must have unique names.</p>
</div>
<div class="paragraph">
-<p>All components exist only within a Process Group. When a user navigates to
the NiFi page, the user is placed in the
-Root Progress Group. If the Input Port is dragged onto the Root Progress
Group, the Input Port provides a mechanism
+<p>All components exist only within a Process Group. When a user initially
navigates to the NiFi page, the user is placed in the
+Root Process Group. If the Input Port is dragged onto the Root Process Group,
the Input Port provides a mechanism
to receive data from remote instances of NiFi. In this case, the Input Port
can be configured to restrict access to
appropriate users.</p>
</div>
<div class="paragraph">
<p><span class="image"><img src="./images/iconOutputPort.png" alt="Output
Port" width="32"></span>
-<strong>Output Port</strong>: Output Ports provide a mechanism for
transferring data from a Process Group back to destination outside
+<strong>Output Port</strong>: Output Ports provide a mechanism for
transferring data from a Process Group to destinations outside
of the Process Group. When an Output Port is dragged onto the canvas, the DFM
is prompted to name the Port. All Ports
within a Process Group must have unique names.</p>
</div>
@@ -715,9 +723,9 @@ that data is removed from the queues of
</div>
<div class="paragraph">
<p><span class="image"><img src="./images/iconProcessGroup.png" alt="Process
Group" width="32"></span>
-<strong>Process Group</strong>: Process Groups can be used logically group a
set of components so that the dataflow is easier to understand
+<strong>Process Group</strong>: Process Groups can be used to logically group
a set of components so that the dataflow is easier to understand
and maintain. When a Process Group is dragged onto the canvas, the DFM is
prompted to name the Process Group. All Process
-Groups within the same parent group must have unique names.</p>
+Groups within the same parent group must have unique names. The Process Group
will then be nested within that parent group.</p>
</div>
<div class="paragraph">
<p><span class="image"><img src="./images/iconRemoteProcessGroup.png"
alt="Remote Process Group" width="32"></span>
@@ -727,8 +735,7 @@ is prompted for the URL of the remote Ni
is the URL of the remote instance’s NiFi Cluster Manager (NCM). When
data is transferred to a clustered instance of NiFi
via an RPG, the RPG it will first connect to the remote instance’s NCM
to determine which nodes are in the cluster and
how busy each node is. This information is then used to load balance the data
that is pushed to each node. The remote NCM is
-then interrogated periodically to ensure that any nodes that are dropped from
the cluster and no longer sent to, any new nodes
-will be added to the list of nodes, and to recalculate the load balancing
based on each node’s load.</p>
+then interrogated periodically to determine information about any nodes that
are dropped from or added to the cluster and to recalculate the load balancing
based on each node’s load.</p>
</div>
<div class="paragraph">
<p><span class="image"><img src="./images/iconFunnel.png" alt="Funnel"
width="32"></span>
@@ -751,8 +758,8 @@ dragged onto the canvas, the DFM is prov
</div>
</div>
<div class="paragraph">
-<p>Clicking the drop-down box shows all available Templates. Any Template that
was created with a description will show an
-icon indicating that there is more information. Hovering over the icon with
the mouse will show this description:</p>
+<p>Clicking the drop-down box shows all available Templates. Any Template that
was created with a description will show a question mark
+icon, indicating that there is more information. Hovering over the icon with
the mouse will show this description:</p>
</div>
<div class="imageblock">
<div class="content">
@@ -793,10 +800,10 @@ the Processor again.</p>
</div>
<div class="paragraph">
<p>This tab contains several different configuration items. First, it allows
the DFM to change the name of the Processor.
-The name of a Processor by default is the same as the Processor type. Next to
the Processor Name is a control for
-determining whether or not the Processor is Enabled. When a Processor is added
to the graph, it is enabled. If the
-Processor is disabled, it cannot be started. This is used to indicate that
even when a group of Processors are started,
-such as when a DFM starts an entire Process Group, this Processor should be
excluded.</p>
+The name of a Processor by default is the same as the Processor type. Next to
the Processor Name is a checkbox, indicating
+ whether the Processor is Enabled. When a Processor is added to the graph, it
is enabled. If the
+Processor is disabled, it cannot be started. The disabled state is used to
indicate that when a group of Processors is started,
+such as when a DFM starts an entire Process Group, this (disabled) Processor
should be excluded.</p>
</div>
<div class="paragraph">
<p>Below the Name configuration, the Processor’s unique identifier is
displayed along with the Processor’s type. These
@@ -808,8 +815,8 @@ piece of data (a FlowFile), an event may
data may be processable at a later time. When this occurs, the Processor may
choose to Penalize the FlowFile. This will
prevent the FlowFile from being Processed for some period of time. For
example, if the Processor is to push the data
to a remote service, but the remote service already has a file with the same
name as the filename that the Processor
-is specifying, the Processor may penalize the FlowFile. The ‘Penalty
duration’ allows the DFM to specify what
-how long the FlowFile should be penalized. The default value is 30 seconds.</p>
+is specifying, the Processor may penalize the FlowFile. The ‘Penalty
duration’ allows the DFM to specify how long the
+FlowFile should be penalized. The default value is 30 seconds.</p>
</div>
<div class="paragraph">
<p>Similarly, the Processor may determine that some situation exists such that
the Processor can no longer make any progress,
@@ -821,16 +828,17 @@ the ‘Yield duration.’ The de
<div class="paragraph">
<p>The last configurable option on the left-hand side of the Settings tab is
the Bulletin level. Whenever the Processor writes
to its log, the Processor also will generate a Bulletin. This setting
indicates the lowest level of Bulletin that should be
-shown in the User Interface. By default, the Bulletin level is set to WARN.</p>
+shown in the User Interface. By default, the Bulletin level is set to WARN,
which means it will display all warning and error-level
+bulletins.</p>
</div>
<div class="paragraph">
-<p>The right-hand side of the dialogue provides an ‘Auto-terminate
relationships’ section. Each of the Relationships that is
+<p>The right-hand side of the Settings tab contains an ‘Auto-terminate
relationships’ section. Each of the Relationships that is
defined by the Processor is listed here, along with its description. In order
for a Processor to be considered valid and
able to run, each Relationship defined by the Processor must be either
connected to a downstream component or auto-terminated.
If a Relationship is auto-terminated, any FlowFile that is routed to that
Relationship will be removed from the flow and
its processing considered complete. Any Relationship that is already connected
to a downstream component cannot be auto-terminated.
The Relationship must first be removed from any Connection that uses it.
Additionally, for any Relationship that is selected to be
-auto-terminated, the auto-termination status will be cleared if the
Relationship is added to a Connection.</p>
+auto-terminated, the auto-termination status will be cleared (turned off) if
the Relationship is added to a Connection.</p>
</div>
</div>
<div class="sect3">
@@ -853,18 +861,18 @@ auto-terminated, the auto-termination st
at which the Processor is run is defined by the ‘Run schedule’
option (see below).</p>
</li>
<li>
-<p><strong>Event driven</strong>: When this mode is selected, the Processor
will be triggered to run by FlowFiles entering the Connections
-that have this Processor as their destination. This mode is not supported by
all Processors. When this mode is
+<p><strong>Event driven</strong>: When this mode is selected, the Processor
will be triggered to run by an event, and that event occurs when FlowFiles
enter Connections
+feeding this Processor. This mode is currently considered experimental and is
not supported by all Processors. When this mode is
selected, the ‘Run schedule’ option is not configurable, as the
Processor is not triggered to run periodically but
-rather is triggered to run as the result of an event. Additionally, this is
the only mode for which the ‘Concurrent tasks’
+ as the result of an event. Additionally, this is the only mode for which
the ‘Concurrent tasks’
option can be set to 0. In this case, the number of threads is limited only by
the size of the Event-Driven Thread Pool that
the administrator has configured.</p>
</li>
<li>
-<p><strong>CRON driven</strong>: When using the CRON driven scheduling mode,
the Processor is scheduled to run periodically, similarly to the
-Timer driven scheduling mode. However, the CRON driven mode provides
significantly more flexibility at the expensive of
-increasing the complexity of the configuration. This value is made up of 6
fields, each separated by a space. These
-fields represent the following fields:</p>
+<p><strong>CRON driven</strong>: When using the CRON driven scheduling mode,
the Processor is scheduled to run periodically, similar to the
+Timer driven scheduling mode. However, the CRON driven mode provides
significantly more flexibility at the expense of
+increasing the complexity of the configuration. This value is made up of six
fields, each separated by a space. These
+fields include:</p>
<div class="olist arabic">
<ol class="arabic">
<li>
@@ -918,7 +926,7 @@ how much of the system’s resources
most Processors. There are, however, some types of Processors that can only be
scheduled with a single Concurrent task.</p>
</div>
<div class="paragraph">
-<p>The “Run schedule” dictates how often this Processor should be
scheduled to run. The valid values for this field depend on the selected
+<p>The “Run schedule” dictates how often the Processor should be
scheduled to run. The valid values for this field depend on the selected
Scheduling Strategy (see above). If using the Event driven Scheduling
Strategy, this field is not available. When using the Timer driven
Scheduling Strategy, this value is a time duration specified by a number
followed by a time unit. For example, <code>1 second</code> or <code>5
mins</code>.
The default value of <code>0 sec</code> means that the Processor should run as
often as possible as long as it has data to process. This is true
@@ -929,7 +937,7 @@ applicable for the CRON driven Schedulin
<p>The right-hand side of the tab contains a slider for choosing the
‘Run duration.’ This controls how long the Processor should be
scheduled
to run each time that it is triggered. On the left-hand side of the slider, it
is marked ‘Lower latency’ while the right-hand side
is marked ‘Higher throughput.’ When a Processor finishes running,
it must update the repository in order to transfer the FlowFiles to
-the next Connection. Updating this repository is expensive, so the more work
that can be done at once before updating the repository
+the next Connection. Updating the repository is expensive, so the more work
that can be done at once before updating the repository,
the more work the Processor can handle (Higher throughput). However, this
means that the next Processor cannot start processing
those FlowFiles until the previous Process updates this repository. As a
result, the latency will be longer (the time required to process
the FlowFile from beginning to end will be longer). As a result, the slider
provides a spectrum from which the DFM can choose to favor
@@ -951,7 +959,7 @@ must define which Properties make sense
<p>This Processor, by default, has only a single property: ‘Routing
Strategy.’ The default value is ‘Route on Property name.’
Next to
the name of this property is a small question-mark symbol (
<span class="image"><img src="./images/iconInfo.png" alt="Question
Mark"></span>
-). This help symbol is seen in other places throughout the application, as
well, and indicates that more information is available.
+). This help symbol is seen in other places throughout the User Interface, and
it indicates that more information is available.
Hovering over this symbol with the mouse will provide additional details about
the property and the default value, as well as
historical values that have been set for the Property.</p>
</div>
@@ -966,24 +974,27 @@ the user is either provided a drop-down
</div>
<div class="paragraph">
<p>In the top-right corner of the tab is a button for adding a New Property.
Clicking this button will provide the DFM with a dialog to
-enter the name and value of a new property. Not all Processors allow
User-Defined properties. In this case, the Processor would become
-invalid when the properties are applied. RouteOnAttribute, for example, does
allow User-Defined properties. In fact, this Processor
-will not be valid until the user has added a property.</p>
+enter the name and value of a new property. Not all Processors allow
User-Defined properties. In processors that do not allow them,
+the Processor becomes invalid when User-Defined properties are applied.
RouteOnAttribute, however, does allow User-Defined properties.
+In fact, this Processor will not be valid until the user has added a
property.</p>
</div>
<div class="paragraph">
<p><span class="image"><img src="./images/edit-property-textarea.png"
alt="Edit Property with Text Area"></span></p>
</div>
<div class="paragraph">
-<p>Not that after a User-Defined property has been added, an icon will appear
on the right-hand side of that row (
+<p>Note that after a User-Defined property has been added, an icon will appear
on the right-hand side of that row (
<span class="image"><img src="./images/iconDelete.png" alt="Delete
Icon"></span>
). Clicking this button will remove the User-Defined property from the
Processor.</p>
</div>
+<div class="paragraph">
+<p>Some processors also have an Advanced User Interface (UI) built into them.
For example, the UpdateAttribute processor has an Advanced UI. To access the
Advanced UI, click the <code>Advanced</code> button that appears at the bottom
of the Configure Processor window. Only processors that have an Advanced UI
will have this button.</p>
+</div>
</div>
<div class="sect3">
<h4 id="comments-tab"><a class="anchor" href="#comments-tab"></a>Comments
Tab</h4>
<div class="paragraph">
-<p>The last tab in the Processor configuration dialog is the Comments tab.
This tab simply provides an area for users to provide
-whatever comments are appropriate for this component:</p>
+<p>The last tab in the Processor configuration dialog is the Comments tab.
This tab simply provides an area for users to include
+whatever comments are appropriate for this component. Use of the Comments tab
is optional:</p>
</div>
<div class="imageblock">
<div class="content">
@@ -995,32 +1006,35 @@ whatever comments are appropriate for th
<div class="sect2">
<h3 id="additional-help"><a class="anchor"
href="#additional-help"></a>Additional Help</h3>
<div class="paragraph">
-<p>Each Processor has the ability to provide additional documentation about
its usage. This documentation can be found by right-clicking
-on the Processor and then selecting the ‘Usage’ item from the
context menu. Alternatively, clicking the ‘Help’ link in the
top-right
-corner of the application will provide a Help page with all of the Processors
that are available. Clicking on the Processor in the list
-will then show its usage.</p>
+<p>The user may access additional documentation about each Processor’s
usage by right-clicking
+on the Processor and then selecting ‘Usage’ from the context menu.
Alternatively, clicking the ‘Help’ link in the top-right
+corner of the User Interface will provide a Help page with all of the
documentation, including usage documentation
+for all the Processors that are available. Clicking on the desired Processor
in the list will display its usage documentation.</p>
</div>
</div>
<div class="sect2">
<h3 id="connecting-components"><a class="anchor"
href="#connecting-components"></a>Connecting Components</h3>
<div class="paragraph">
-<p>After the appropriate Processors have been added to the graph and
configured to meet your needs, they will have to be connected
+<p>Once processors and other components have been added to the graph and
configured, the next step is to connect them
to one another so that NiFi knows what to do with each FlowFile after it has
been processed. This is accomplished by creating a
-Connection between two components. When the mouse hovers over a component, a
new Connection icon (
+Connection between each component. When the user hovers the mouse over the
center of a component, a new Connection icon (
<span class="image"><img src="./images/addConnect.png" alt="Connection
Bubble"></span>
-) will appear in the middle of the component:</p>
+) appears:</p>
</div>
<div class="paragraph">
<p><span class="image"><img src="./images/processor-connection-bubble.png"
alt="Processor with Connection Bubble"></span></p>
</div>
<div class="paragraph">
-<p>This Connection bubble can then be dragged from this component to another
component, which will provide to the user a
-‘Create Connection’ dialog. This dialog consists of two tabs:
‘Details’ and ‘Settings’.</p>
+<p>The user drags the Connection bubble from one component to another until
the second component is highlighted. When the user
+releases the mouse, a ‘Create Connection’ dialog appears. This
dialog consists of two tabs: ‘Details’ and ‘Settings’.
They are
+discussed in detail below. Note that it is possible to draw a connection so
that it loops back on the same processor. This can be
+useful if the DFM wants the processor to try to re-process FlowFiles if they
go down a failure Relationship. To create this type of looping
+connection, simply drag the connection bubble away and then back to the same
processor until it is highlighted. Then release the mouse and the same
<em>Create Connection</em> dialog appears.</p>
</div>
<div class="sect3">
<h4 id="details-tab"><a class="anchor" href="#details-tab"></a>Details Tab</h4>
<div class="paragraph">
-<p>The Details Tab provides information about the source and destination
components, including the component name, the
+<p>The Details Tab of the <em>Create Connection</em> dialog provides
information about the source and destination components, including the
component name, the
component type, and the Process Group in which the component lives:</p>
</div>
<div class="imageblock">
@@ -1040,7 +1054,7 @@ automatically be ‘cloned’, a
<div class="sect3">
<h4 id="settings"><a class="anchor" href="#settings"></a>Settings</h4>
<div class="paragraph">
-<p>The Settings Tab provides the ability to configure the Connection’s
name, FlowFile expiration, back pressure thresholds, and
+<p>The Settings Tab provides the ability to configure the Connection’s
name, FlowFile expiration, Back Pressure thresholds, and
Prioritization:</p>
</div>
<div class="paragraph">
@@ -1051,30 +1065,52 @@ Prioritization:</p>
that are active for the Connection.</p>
</div>
<div class="paragraph">
-<p>File expiration is a concept by which data that cannot be processed in a
timely fashion can be automatically destroyed.
+<p>File expiration is a concept by which data that cannot be processed in a
timely fashion can be automatically removed from the flow.
This is useful, for example, when the volume of data is expected to exceed the
volume that can be sent to a remote site.
In this case, the expiration can be used in conjunction with Prioritizers to
ensure that the highest priority data is
-processed first and then anything that cannot be processed within one hour,
for example, can be dropped. The default
+processed first and then anything that cannot be processed within a certain
time period (one hour, for example) can be dropped. The default
value of <code>0 sec</code> indicates that the data will never expire.</p>
</div>
<div class="paragraph">
-<p>NiFi provides two different configuration elements for back pressure. These
thresholds indicate how much data should be
+<p>NiFi provides two configuration elements for Back Pressure. These
thresholds indicate how much data should be
allowed to exist in the queue before the component that is the source of the
Connection is no longer scheduled to run.
This allows the system to avoid being overrun with data. The first option
provided is the “Back pressure object threshold.”
This is the number of FlowFiles that can be in the queue before back pressure
is applied. The second configuration option
is the “Back pressure data size threshold.”
-This specifies the maximum amount of data that should be queued up before
+This specifies the maximum amount of data (in size) that should be queued up
before
applying back pressure. This value is configured by entering a number followed
by a data size (<code>B</code> for bytes, <code>KB</code> for
kilobytes, <code>MB</code> for megabytes, <code>GB</code> for gigabytes, or
<code>TB</code> for terabytes).</p>
</div>
<div class="paragraph">
-<p>The right-hand side of the tab provides the ability to prioritize the data
in queue so that higher priority data is
+<p>The right-hand side of the tab provides the ability to prioritize the data
in the queue so that higher priority data is
processed first. Prioritizers can be dragged from the top (‘Available
prioritizers’) to the bottom (‘Selected prioritizers’).
Multiple prioritizers can be selected. The prioritizer that is at the top of
the ‘Selected prioritizers’ list is the highest
priority. If two FlowFiles have the same value according to this prioritizer,
the second prioritizer will determine which
FlowFile to process first, and so on. If a prioritizer is no longer desired,
it can then be dragged from the ‘Selected
prioritizers’ list to the ‘Available prioritizers’ list.</p>
</div>
+<div class="paragraph">
+<p>The following prioritizers are available:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>FirstInFirstOutPrioritizer</strong>: Given two FlowFiles, the on
that reached the connection first will be processed first.</p>
+</li>
+<li>
+<p><strong>NewestFlowFileFirstPrioritizer</strong>: Given two FlowFiles, the
one that is newest in the dataflow will be processed first.</p>
+</li>
+<li>
+<p><strong>OldestFlowFileFirstPrioritizer</strong>: Given two FlowFiles, the
on that is oldest in the dataflow will be processed first. This is the default
scheme that is used if no prioritizers are selected.</p>
+</li>
+<li>
+<p><strong>PriorityAttributePrioritizer</strong>: Given two FlowFiles that
both have a "priority" attribute, the one that has the highest priority value
will be prprocessed first. Note that an UpdateAttribute processor should be
used to add the "priority" attribute to the FlowFiles before they reach a
connection that has this prioritizer set. Values for the "priority" attribute
may be alphanumeric, where "a" is a higher priority than "z", and "1" is a
higher priority than "9", for example.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>Note</strong>: After a connection has been drawn between two
components, the connection’s configuration may be changed, and the
connection may be moved to a new destination; however, the processors on either
side of the connection must be stopped before a configuration or destination
change may be made.</p>
+</div>
</div>
</div>
<div class="sect2">
@@ -1091,8 +1127,8 @@ will show a yellow Warning indicator wit
</div>
<div class="paragraph">
<p>In this case, hovering over the indicator icon with the mouse will provide
a tooltip showing all of the validation
-failures for the Processor. Once all of the validation errors have been
addressed, the status indicator will change
-to a Stop icon, indicating that the Processor is valid and ready to be start
but currently is not running:</p>
+errors for the Processor. Once all of the validation errors have been
addressed, the status indicator will change
+to a Stop icon, indicating that the Processor is valid and ready to be started
but currently is not running:</p>
</div>
<div class="imageblock">
<div class="content">
@@ -1100,6 +1136,58 @@ to a Stop icon, indicating that the Proc
</div>
</div>
</div>
+<div class="sect2">
+<h3 id="example-dataflow"><a class="anchor"
href="#example-dataflow"></a>Example Dataflow</h3>
+<div class="paragraph">
+<p>This section has described the steps required to build a dataflow. Now, to
put it all together. The following example dataflow
+consists of just two processors: GenerateFlowFile and LogAttribute. These
processors are normally used for testing, but they can also be used
+to build a quick flow for demonstration purposes and see NiFi in action.</p>
+</div>
+<div class="paragraph">
+<p>After you drag the GenerateFlowFile and LogAttribute processors to the
graph and connect them (using the guidelines provided above), configure them as
follows:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Generate FlowFile</p>
+<div class="ulist">
+<ul>
+<li>
+<p>On the Scheduling tab, set Run schedule to: 5 sec. Note that the
GenerateFlowFile processor can create many FlowFiles very quickly; that’s
why setting the Run schedule is important so that this flow does not overwhelm
the system NiFi is running on.</p>
+</li>
+<li>
+<p>On the Properties tab, set File Size to: 10 kb</p>
+</li>
+</ul>
+</div>
+</li>
+<li>
+<p>Log Attribute</p>
+<div class="ulist">
+<ul>
+<li>
+<p>On the Settings tab, under Auto-terminate relationships, select the
checkbox next to Success. This will terminate FlowFiles after this processor
has successfully processed them.</p>
+</li>
+<li>
+<p>Also on the Settings tab, set the Bulletin level to Info. This way, when
the dataflow is running, this processor will display the bulletin icon (see <a
href="#processor_anatomy">Anatomy of a Processor</a>), and the user may hover
over it with the mouse to see the attributes that the processor is logging.</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The dataflow should look like the following:</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="./images/simple-flow.png" alt="Simple Flow" width="900">
+</div>
+</div>
+<div class="paragraph">
+<p>Now see the following section on how to start and stop the dataflow. When
the dataflow is running, be sure to note the statistical information that is
displayed on the face of each processor (see <a
href="#processor_anatomy">Anatomy of a Processor</a>).</p>
+</div>
+</div>
</div>
</div>
<div class="sect1">
@@ -1108,7 +1196,7 @@ to a Stop icon, indicating that the Proc
<div class="paragraph">
<p>When a component is added to the NiFi canvas, it is in the Stopped state.
In order to cause the component to
be triggered, the component must be started. Once started, the component can
be stopped at any time. From a
-Stopped state, the component can then be configured, started, or disabled.</p>
+Stopped state, the component can be configured, started, or disabled.</p>
</div>
<div class="sect2">
<h3 id="starting-a-component"><a class="anchor"
href="#starting-a-component"></a>Starting a Component</h3>
@@ -1121,7 +1209,7 @@ Stopped state, the component can then be
<p>The component’s configuration must be valid.</p>
</li>
<li>
-<p>All defined Relationships for component must be connected to another
component or auto-terminated.</p>
+<p>All defined Relationships for the component must be connected to another
component or auto-terminated.</p>
</li>
<li>
<p>The component must be stopped.</p>
@@ -1155,7 +1243,7 @@ be started, with the exception of those
<h3 id="stopping-a-component"><a class="anchor"
href="#stopping-a-component"></a>Stopping a Component</h3>
<div class="paragraph">
<p>A component can be stopped any time that it is running. A component is
stopped by right-clicking on the component
-and clicking Stop from the context menu, or by clicking the Stop icon (
+and clicking Stop from the context menu, or by selecting the component and
clicking the Stop icon (
<span class="image"><img src="./images/iconStop.png" alt="Stop"></span>
) in the Actions Toolbar.</p>
</div>
@@ -1164,7 +1252,7 @@ and clicking Stop from the context menu,
will be stopped.</p>
</div>
<div class="paragraph">
-<p>Once stopped, the status indicator of a Processor will change to the Stop
symbol (
+<p>Once stopped, the status indicator of a component will change to the Stop
symbol (
<span class="image"><img src="./images/iconStop.png" alt="Stop"></span>
).</p>
</div>
@@ -1177,11 +1265,10 @@ for more information).</p>
<div class="sect2">
<h3 id="enabling-disabling-a-component"><a class="anchor"
href="#enabling-disabling-a-component"></a>Enabling/Disabling a Component</h3>
<div class="paragraph">
-<p>When a component is enabled, it is able to be started. Components may be
disabled when part of a
-dataflow is still being assembled, for example, and as a result should not be
started. Typically,
-if a component is not intended to be run, the component is disabled, rather
than being left in the
-Stopped state. This helps to distinguish between components that are
intentionally not running and
-those components that may have been stopped temporarily (for instance, to
change the component’s
+<p>When a component is enabled, it is able to be started. Users may choose to
disable components when they are part of a
+dataflow that is still being assembled, for example. Typically, if a component
is not intended to be run, the component
+is disabled, rather than being left in the Stopped state. This helps to
distinguish between components that are
+intentionally not running and those that may have been stopped temporarily
(for instance, to change the component’s
configuration) and inadvertently were never restarted.</p>
</div>
<div class="paragraph">
@@ -1251,8 +1338,8 @@ be shown.</p>
<div class="paragraph">
<p><strong>Note</strong>: If a Port that is expected to be shown is not shown
in this dialog, ensure that the instance has proper
permissions and that the Remote Process Group’s flow is current. This
can be checked by closing the Port
-Configuration Dialog and looking at the bottom-right corner of the Remote
Process Group. The data at which
-the flow was last refresh is shown. If the flow appears to be outdated, it can
be updated by right-clicking
+Configuration Dialog and looking at the bottom-right corner of the Remote
Process Group. The date at which
+the flow was last refreshed is shown. If the flow appears to be outdated, it
can be updated by right-clicking
on the Remote Process Group and selecting “Refresh flow.” (See <a
href="#remote_group_anatomy">Anatomy of a Remote Process Group</a> for more
information).</p>
</div>
<div class="paragraph">
@@ -1278,6 +1365,15 @@ or not compression should be used when t
</div>
</div>
<div class="sect1">
+<h2 id="navigating"><a class="anchor" href="#navigating"></a>Navigating within
a DataFlow</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>NiFi provides various mechanisms for getting around a dataflow. The <a
href="#User_Interface">NiFi User Interface</a> section discussed various ways
to navigate around
+the NiFi graph; however, once a flow exists on the graph, there are additional
ways to get from one component to another. The <a href="#User Interface">[User
Interface]</a> section showed that when multiple Process Groups exist in a
flow, breadcrumbs appear under the toolbar, providing a way to navigate between
them. In addition, to enter a Process Group that is currently visible on the
graph, simply double-click it, thereby "drilling down" into it. Connections
also provide a way to jump from one location to another within the flow.
Right-click on a connection and select "Go to source" or "Go to destination" in
order to jump to one end of the connection or another. This can be very useful
in large, complex dataflows, where the connection lines may be long and span
large areas of the graph. Finally, all components provide the ability to jump
forward or backward within the flow. Right-click any component (e.g., a
processor, process group, port, etc.) and select either "Upstream connec
tions" or "Downstream connections". A dialog window will open, showing the
available upstream or downstream connections that the user may jump to. This
can be especially useful when trying to follow a dataflow in a backward
direction. It is typically easy to follow the path of a dataflow from start to
finish, drilling down into nested process groups; however, it can be more
difficult to follow the dataflow in the other direction.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
<h2 id="monitoring"><a class="anchor" href="#monitoring"></a>Monitoring of
DataFlow</h2>
<div class="sectionbody">
<div class="paragraph">
@@ -1358,7 +1454,7 @@ this value represents the number of task
</li>
<li>
<p><strong>5-Minute Statistics</strong>: The Processor shows several different
statistics in tabular form. Each of these
-statistics represent the amount of work that has been performed in the past
five minutes. If the NiFi
+statistics represents the amount of work that has been performed in the past
five minutes. If the NiFi
instance is clustered, these values indicate how much work has been done by
all of the Nodes combined
in the past five minutes. These metrics are:</p>
<div class="ulist">
@@ -1378,9 +1474,9 @@ and write data. In this example, we see
MB of the FlowFile content and has written 4.7 MB as well. This is what we
would expect,
since this Processor simply copies the contents of a FlowFile to disk. Note,
however, that this is
not the same as the amount of data that it pulled from its input queues. This
is because some of
-the files that it pulled from the input queues already exists in the output
directory, and the
+the files that it pulled from the input queues already exist in the output
directory, and the
Processor is configured to route FlowFiles to failure when this occurs.
Therefore, for those files
-which already existed in the output directory, no data was read nor written to
disk.</p>
+which already existed in the output directory, data was neither read nor
written to disk.</p>
</li>
<li>
<p><strong>Out</strong>: The amount of data that the Processor has transferred
to its outbound Connections. This does
@@ -1445,9 +1541,9 @@ to provide information about the Process
Group and clicking the “Configure” menu option. In this example,
the Comments are set to “Example Process Group.”</p>
</li>
<li>
-<p><strong>Statistics</strong>: Process Groups provide statics about the
amount of data that has been processed by the Process Group in
+<p><strong>Statistics</strong>: Process Groups provide statistics about the
amount of data that has been processed by the Process Group in
the past 5 minutes as well as the amount of data currently enqueued within the
Process Group. The following elements
-comprise the “Statics” portion of a Process Group:</p>
+comprise the “Statistics” portion of a Process Group:</p>
<div class="ulist">
<ul>
<li>
@@ -1515,7 +1611,7 @@ within the Process Group. The following
<li>
<p><span class="image"><img src="./images/iconAlert.png" alt="Invalid
Components"></span>
<strong>Invalid Components</strong>: The number of Processors, Input Ports,
and Output Ports that are enabled but are currently
- not in a valid state. This may be due to misconfigured properties or not
have all of the Relationships connected.</p>
+ not in a valid state. This may be due to misconfigured properties or
missing Relationships.</p>
</li>
<li>
<p><span class="image"><img src="./images/iconDisable.png" alt="Disabled
Components"></span>
@@ -1568,7 +1664,7 @@ of the remote instance will be shown her
</li>
<li>
<p><strong>Remote Instance URL</strong>: This is the URL of the remote
instance that the Remote Process Group points to.
-This URL is entered when the Remote Process Group is added to the canvas and
cannot be changed.</p>
+This URL is entered when the Remote Process Group is added to the canvas and
it cannot be changed.</p>
</li>
<li>
<p><strong>Secure Indicator</strong>: This icon indicates whether or not
communications with the remote NiFi instance are
@@ -1579,14 +1675,14 @@ This URL is entered when the Remote Proc
<span class="image"><img src="./images/iconNotSecure.png" alt="Not
Secure"></span>
). If the communications are secure, this instance of NiFi will not be
able to communicate with the
remote instance until an administrator for the remote instance grants
access. Whenever the Remote Process
- Group is added to the canvas, this will automatically initiate a
request to have a user created on the
+ Group is added to the canvas, this will automatically initiate a
request to have a user for this instance of NiFi created on the
remote instance. This instance will be unable to communicate with the
remote instance until an administrator
on the remote instance adds the user to the system and adds the
“NiFi” role to the user.
In the event that communications are not secure, the Remote Process
Group is able to receive data from anyone,
- and the data is not encrypted as it is transferred between instances of
NiFi.</p>
+ and the data is not encrypted while it is transferred between instances
of NiFi.</p>
</li>
<li>
-<p><strong>Input Ports</strong>: This section shows three different pieces of
information:</p>
+<p><strong>Input Ports</strong>: This section shows three pieces of
information:</p>
<div class="ulist">
<ul>
<li>
@@ -1611,7 +1707,7 @@ This URL is entered when the Remote Proc
</div>
</li>
<li>
-<p><strong>Output Ports</strong>: Similar to the “Input Ports”
section above, this element shows three different pieces of information:</p>
+<p><strong>Output Ports</strong>: Similar to the “Input Ports”
section above, this element shows three pieces of information:</p>
<div class="ulist">
<ul>
<li>
@@ -1648,7 +1744,7 @@ instance as a whole.</p>
<li>
<p><strong>Last Refreshed Time</strong>: The information that is pulled from a
remote instance and rendered on the Remote Process Group
in the User Interface is periodically refreshed in the background. This
element indicates the time at which that refresh
-last happened, or if the information has not been refreshed for a significant
amount of time the value will change to
+last happened, or if the information has not been refreshed for a significant
amount of time, the value will change to
indicate <em>Remote flow not current</em>. NiFi can be triggered to initiate a
refresh of this information by right-clicking
on the Remote Process Group and choosing the “Refresh flow” menu
item.</p>
</li>
@@ -1685,7 +1781,8 @@ the different elements within the dialog
<div class="paragraph">
<p>The Summary page consists mostly of a table that provides information about
each of the components on the canvas. Above this
table is a set of five tabs that can be used to view the different types of
components. The information provided in the table
-is the same information that is provided for each component on the canvas. For
more information, see the sections
+is the same information that is provided for each component on the canvas.
Each of the columns in the table may be sorted by
+double-clicking on the heading of the column. For more on the types of
information displayed, see the sections
<a href="#processor_anatomy">Anatomy of a Processor</a>, <a
href="#process_group_anatomy">Anatomy of a Process Group</a>, and <a
href="#remote_group_anatomy">Anatomy of a Remote Process Group</a> above.</p>
</div>
<div class="paragraph">
@@ -1694,9 +1791,11 @@ is the same information that is provided
<div class="ulist">
<ul>
<li>
-<p><strong>Bulletin Indicator</strong>: As in other places throughout the
application, when this icon is present, hovering over the icon will
+<p><strong>Bulletin Indicator</strong>: As in other places throughout the User
Interface, when this icon is present, hovering over the icon will
provide information about the Bulletin that was generated, including the
message, the severity level, the time at which
-the Bulletin was generated, and (in a clustered environment) the node that
generated the Bulletin.</p>
+the Bulletin was generated, and (in a clustered environment) the node that
generated the Bulletin. Like all the columns in the
+Summary table, this column where bulletins are shown may be sorted
+by double-clicking on the heading so that all the currently existing bulletins
are shown at the top of the list.</p>
</li>
<li>
<p><strong>Details</strong>: Clicking the Details icon will provide the user
with the details of the component. This dialog is the same as the
@@ -1709,7 +1808,7 @@ in a new browser tab or window (by click
</li>
<li>
<p><strong>Stats History</strong>: Clicking the Stats History icon will open a
new dialog that shows a historical view of the statistics that
-are rendered for this component. See the section <a
href="#Stats_History">Historical Statics of a Component</a> for more
information.</p>
+are rendered for this component. See the section <a
href="#Stats_History">Historical Statistics of a Component</a> for more
information.</p>
</li>
<li>
<p><strong>Refresh</strong>: The Refresh button allows the user to refresh the
information displayed without closing the dialog and opening it
@@ -1719,7 +1818,7 @@ on the page is not automatically refresh
<li>
<p><strong>Filter</strong>: The Filter element allows users to filter the
contents of the Summary table by typing in all or part of some criteria,
such as a Processor Type or Processor Name. The types of filters available
differ according to the selected tab. For instance,
-if viewing the Processor tab, the user is able to filter by name or by type.
When changing to the Connections tab, the user
+if viewing the Processor tab, the user is able to filter by name or by type.
When viewing the Connections tab, the user
is able to filter by source name, destination name, or Connection name. The
filter is automatically applied when the contents
of the text box are changed. Below the text box is an indicator of how many
entries in the table match the filter and how many
entries exist in the table.</p>
@@ -1731,7 +1830,7 @@ Pop-Out button, next to the Close button
browser tab/window. In the new tab/window, the Pop-Out button and the Go-To
button will no longer be available.</p>
</li>
<li>
-<p><strong>System Diagnostics</strong>: The System Diagnostics tab provides
information about how the system is performing with respect to
+<p><strong>System Diagnostics</strong>: The System Diagnostics window provides
information about how the system is performing with respect to
system resource utilization. While this is intended mostly for administrators,
it is provided in this view because it
does provide a summary of the system. This dialog shows information such as
CPU utilization, how full the disks are,
and Java-specific metrics, such as memory size and utilization, as well as
Garbage Collection information.</p>
@@ -1740,11 +1839,11 @@ and Java-specific metrics, such as memor
</div>
</div>
<div class="sect2">
-<h3 id="Stats_History"><a class="anchor" href="#Stats_History"></a>Historical
Statics of a Component</h3>
+<h3 id="Stats_History"><a class="anchor" href="#Stats_History"></a>Historical
Statistics of a Component</h3>
<div class="paragraph">
<p>While the Summary table and the canvas show numeric statistics pertaining
to the performance of a component over the
past five minutes, it is often useful to have a view of historical statistics
as well. This information is available
-by right-clicking on a component and choosing the “Stats” menu
option or from the Summary page (see <a href="#Summary_Page">Summary Page</a>
+by right-clicking on a component and choosing the “Stats” menu
option or by clicking on the Stats History in the Summary page (see <a
href="#Summary_Page">Summary Page</a>
for more information).</p>
</div>
<div class="paragraph">
@@ -1786,12 +1885,12 @@ the type of Processor is displayed. For
only on the range of time selected, if any time range is selected. If this
instance of NiFi is clustered, these values
are shown for the cluster as a whole, as well as each individual node. In a
clustered environment, each node is shown
in a different color. This also serves as the graph’s legend, showing
the color of each node that is shown in the graph.
-Hovering over the Cluster or one of the nodes in the legend will also make the
corresponding node bold in the graph.</p>
+Hovering the mouse over the Cluster or one of the nodes in the legend will
also make the corresponding node bold in the graph.</p>
</li>
</ul>
</div>
<div class="paragraph">
-<p>The right-hand side of the dialog provides a drop-down list to choose which
metric to render, as well as two graphs.
+<p>The right-hand side of the dialog provides a drop-down list of the
different types of metrics to render in the graphs below.
The top graph is larger so as to provide an easier-to-read rendering of the
information. In the bottom-right corner of
this graph is a small handle (
<span class="image"><img src="./images/iconResize.png" alt="Resize"></span>
@@ -1800,7 +1899,7 @@ to move the entire dialog.</p>
</div>
<div class="paragraph">
<p>The bottom graph is much shorter and provides the ability to select a time
range. Selecting a time range here will
-cause the top graph to redraw the graph, showing only the time range selected.
Additionally, this will cause the
+cause the top graph to show only the time range selected, but in a more
detailed manner. Additionally, this will cause the
Min/Max/Mean values on the left-hand side to be recalculated. Once a selection
has been created by dragging a
rectangle over the graph, double-clicking on the selected portion will cause
the selection to fully expand in the
vertical direction. I.e., it will select all values in this time range.
Clicking on the bottom graph without dragging
@@ -1813,7 +1912,7 @@ will remove the selection.</p>
<h2 id="templates"><a class="anchor" href="#templates"></a>Templates</h2>
<div class="sectionbody">
<div class="paragraph">
-<p>DataFlow Managers have the ability to build very large and complex
DataFlows using Apache NiFi. This is achieved
+<p>DataFlow Managers have the ability to build very large and complex
DataFlows using NiFi. This is achieved
by using the basic components: Processor, Funnel, Input/Output Port, Process
Group, and Remote Process Group. These
can be thought of as the most basic building blocks for constructing a
DataFlow. At times, though, using these
small building blocks can become tedious if the same logic needs to be
repeated several times.</p>
@@ -1847,11 +1946,11 @@ error message if unable to create the te
<div class="content">
<div class="title">Note</div>
<div class="paragraph">
-<p>It is important to note that if any Processor that is Templated has a
sensitive property, the value of that
+<p>It is important to note that if any Processor that is Templated has a
sensitive property (such as a password), the value of that
sensitive property is not included in the Template. As a result, when dragging
the Template onto the graph, newly
created Processors may not be valid if they are missing values for their
sensitive properties. Additionally, any
-Connection that was selected when making the Template is is not included in
the Template if either the source or the
-destination of the Connection is not included in the Template.</p>
+Connection that was selected when making the Template is not included in the
Template if either the source or the
+destination of the Connection is not also included in the Template.</p>
</div>
</div>
</div>
@@ -1870,8 +1969,8 @@ click the “Add” button. The
being placed wherever the user dropped the Template icon.</p>
</div>
<div class="paragraph">
-<p>This leaves the contents of the newly instantiated Template selected. If
there was a mistake and this Template is no
-longer wanted, pressing the Delete key will remove the Template.</p>
+<p>This leaves the contents of the newly instantiated Template selected. If
there was a mistake, and this Template is no
+longer wanted, it may be deleted.</p>
</div>
</div>
<div class="sect2">
@@ -1900,8 +1999,8 @@ added to the table and the “Browse
<h4 id="Export_Template"><a class="anchor"
href="#Export_Template"></a>Exporting a Template</h4>
<div class="paragraph">
<p>Once a Template has been created, it can be shared with others in the
Template Management page (see <a href="#Manage_Templates">Managing
Templates</a>).
-To export a Template, locate the Template in the table of the Template
Management page. The Filter in the top-right corner
-will help to find the appropriate Template if several are available. Then
click the Export or Download button (
+To export a Template, locate the Template in the table. The Filter in the
top-right corner
+can be used to help find the appropriate Template if several are available.
Then click the Export or Download button (
<span class="image"><img src="./images/iconExport.png" alt="Export"></span>
). This will download the template as an XML file to your computer. This XML
file can then be sent to others and imported
into other instances of NiFi (see <a href="#Import_Template">Importing a
Template</a>).</p>
@@ -1912,7 +2011,7 @@ into other instances of NiFi (see <a hre
<div class="paragraph">
<p>Once it is decided that a Template is no longer needed, it can be easily
removed from the Template Management page
(see <a href="#Manage_Templates">Managing Templates</a>). To delete a
Template, locate it in the table (the Filter in the top-right corner
-may help to find the appropriate Template if several are available) and click
the Delete button (
+may be used to find the appropriate Template if several are available) and
click the Delete button (
<span class="image"><img src="./images/iconDelete.png" alt="Delete"></span>
). This will prompt for confirmation. After confirming the deletion, the
Template will be removed from this table
and will no longer be available to add to the canvas.</p>
@@ -1924,40 +2023,147 @@ and will no longer be available to add t
<div class="sect1">
<h2 id="data-provenance"><a class="anchor" href="#data-provenance"></a>Data
Provenance</h2>
<div class="sectionbody">
+<div class="paragraph">
+<p>While monitoring a dataflow, users often need a way to determine what
happened to a particular data object (FlowFile).
+NiFi’s Data Provenance page provides that information. Because NiFi
records and indexes data provenance details
+as objects flow through the system, users may perform searches, conduct
troubleshooting and evaluate things
+like dataflow compliance and optimization in real time. By default, NiFi
updates this information every five minutes, but that
+is configurable.</p>
+</div>
+<div class="paragraph">
+<p>To access the Data Provenance page, click the Data Provenance button in the
Management Toolbar (see <a href="#User_Interface">NiFi User Interface</a>)
+( <span class="image"><img src="./images/iconProvenance.png" alt="Data
Provenance" width="28"></span>
+). Clicking this button opens a dialog window that allows the user to see the
most recent Data Provenance information available,
+search the information for specific items, and filter the search results. It
is also possible to open additional dialog windows to see event details,
+replay data at any point within the dataflow, and see a graphical
representation of the data’s lineage, or path through the flow.
+(These features are described in depth below.)</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/provenance-annotated.png"
alt="Provenance Table"></span></p>
+</div>
+<div class="paragraph">
+<p>Each point in a dataflow where a FlowFile is processed in some way is
considered a "processing event". Various types of processing
+events occur, depending on the dataflow design. For example, when data is
brought into the flow, a RECEIVE event occurs, and when
+data is sent out of the flow, a SEND event occurs. Other types of processing
events may occur, such as if the data is cloned (CLONE event), routed (ROUTE
event), modified (CONTENT_MODIFIED or ATTRIBUTES_MODIFIED event),
+split (FORK event), combined with other data objects (JOIN event), and
ultimately removed from the flow (DROP event).</p>
+</div>
<div class="sect2">
<h3 id="searching-for-events"><a class="anchor"
href="#searching-for-events"></a>Searching for Events</h3>
-
+<div class="paragraph">
+<p>One of the most common tasks performed in the Data Provenance page is a
search for a given FlowFile to determine what happened to it. To do this,
+click the <code>Search</code> button in the upper-right corner of the Data
Provenance page. This opens a dialog window with parameters that the user can
+define for the search. The parameters include the processing event of
interest, distinguishing characteristics about the FlowFile or the component
that produced the event, the timeframe within which to search, and the size of
the FlowFile.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/search-events.png" alt="Search
Events" width="400"></span></p>
+</div>
+<div class="paragraph">
+<p>For example, to determine if a particular FlowFile was received, search for
an Event Type of "RECEIVE" and include an
+identifier for the FlowFile, such as its uuid or filename. The asterisk (*)
may be used as a wildcard for any number of characters.
+So, to determine whether a FlowFile with "ABC" anywhere in its filename was
received at any time on Jan. 6, 2015, the search shown in the following
+image could be performed:</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/search-receive-event-abc.png"
alt="Search for RECEIVE Event" width="400"></span></p>
</div>
-<div class="sect2">
-<h3 id="details-of-an-event"><a class="anchor"
href="#details-of-an-event"></a>Details of an Event</h3>
-
</div>
<div class="sect2">
-<h3 id="viewing-flowfile-content"><a class="anchor"
href="#viewing-flowfile-content"></a>Viewing FlowFile Content</h3>
-
+<h3 id="event_details"><a class="anchor" href="#event_details"></a>Details of
an Event</h3>
+<div class="paragraph">
+<p>In the far-left column of the Data Provenance page, there is a View Details
icon for each event ( <span class="image"><img
src="./images/iconViewDetails.png" alt="View Details" width="32"></span> ).
+Clicking this button opens a dialog window with three tabs: Details,
Attributes, and Content.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/event-details.png" alt="Event
Details" width="700"></span></p>
+</div>
+<div class="paragraph">
+<p>The Details tab shows various details about the event, such as when it
occurred, what type of event it was, and the component that produced the event.
+The information that is displayed will vary according to the event type. This
tab also shows information about the FlowFile that was processed. In
+addition to the FlowFile’s UUID, which is displayed on the left side of
the Details tab, the UUIDs of any parent or children FlowFiles that are related
+to that FlowFile are displayed on the right side of the Details tab.</p>
+</div>
+<div class="paragraph">
+<p>The Attributes tab shows the attributes that exist on the FlowFile as of
that point in the flow. In order to see only the attributes that were modified
as
+a result of the processing event, the user may select the checkbox next to
"Only show modified" in the upper-right corner of the Attributes tab.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/event-attributes.png" alt="Event
Attributes" width="700"></span></p>
+</div>
</div>
<div class="sect2">
<h3 id="replaying-a-flowfile"><a class="anchor"
href="#replaying-a-flowfile"></a>Replaying a FlowFile</h3>
-
+<div class="paragraph">
+<p>A Dataflow Manager may need to inspect a FlowFile’s content at some
point in the dataflow to ensure that it is being processed as expected. And if
it
+is not being processed properly, the DFM may need to make adjustments to the
dataflow and replay the FlowFile again. The Content tab of the View Details
dialog window is where the DFM can do these things. The Content tab shows
information about the FlowFile’s content, such as its location in the
Content Repository
+and its size. In addition, it is here that the user may click the
<code>Download</code> button in order to download a copy of the
FlowFile’s content as it existed
+at this point in the flow. The user may also click the <code>Submit</code>
button to replay the FlowFile at this point in the flow. Upon clicking
<code>Submit</code>,
+the FlowFile is sent to the connection feeding the component that produced
this processing event.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/event-content.png" alt="Event
Content" width="700"></span></p>
+</div>
</div>
<div class="sect2">
<h3 id="viewing-flowfile-lineage"><a class="anchor"
href="#viewing-flowfile-lineage"></a>Viewing FlowFile Lineage</h3>
+<div class="paragraph">
+<p>It is often useful to see a graphical representation of the lineage or path
a FlowFile took within the dataflow. To see a FlowFile’s lineage, click
on the "Show Lineage" icon ( <span class="image"><img
src="./images/iconLineage.png" alt="Show Lineage" width="28"></span> ) in the
far-right column
+of the Data Provenance table. This opens a graph displaying the FlowFile (
<span class="image"><img src="./images/lineage-flowfile.png" alt="FlowFile"
width="32"></span> ) and the various processing events that have occurred. The
selected event will be highlighted in yellow. It is possible to right-click on
any event to see that event’s details (See <a
href="#event_details">Details of an Event</a>)
+To see how the lineage evolved over time, click the slider at the bottom-left
of the window and move it to the left to see the state of the lineage at
earlier stages in the dataflow.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/lineage-graph-annotated.png"
alt="Lineage Graph" width="900"></span></p>
+</div>
<div class="sect3">
<h4 id="find-parents"><a class="anchor" href="#find-parents"></a>Find
Parents</h4>
-
+<div class="paragraph">
+<p>Sometimes, a user may need to track down the original FlowFile that another
FlowFile was spawned from. For example, when a FORK or CLONE event occurs, NiFi
keeps
+track of the parent FlowFile that produced other FlowFiles, and it is possible
to find that parent FlowFile in the Lineage. Right-click on the event in the
+lineage graph and select "Find parents" from the context menu.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/find-parents.png" alt="Find Parents"
width="250"></span></p>
+</div>
+<div class="paragraph">
+<p>Once "Find parents" is selected, the graph is re-drawn to show the parent
FlowFile and its lineage as well as the child and its lineage.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/parent-found.png" alt="Parent Found"
width="250"></span></p>
+</div>
</div>
<div class="sect3">
<h4 id="expanding-an-event"><a class="anchor"
href="#expanding-an-event"></a>Expanding an Event</h4>
-
+<div class="paragraph">
+<p>In the same way that it is useful to find a parent FlowFile, the user may
also want to determine what children were spawned from a given FlowFile. To do
this, right-click on the event in the lineage graph and select "Expand" from
the context menu.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/expand-event.png" alt="Expand Event"
width="250"></span></p>
+</div>
+<div class="paragraph">
+<p>Once "Expand" is selected, the graph is re-drawn to show the children and
their lineage.</p>
+</div>
+<div class="paragraph">
+<p><span class="image"><img src="./images/expanded-events.png" alt="Expanded
Events" width="300"></span></p>
+</div>
</div>
</div>
</div>
</div>
+<div class="sect1">
+<h2 id="other_management_features"><a class="anchor"
href="#other_management_features"></a>Other Management Features</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>In addition to the Summary Page, Data Provenance Page, Template Management
Page, and Bulletin Board Page, there are other tools in the Management Toolbar
(See <a href="#User_Interface">NiFi User Interface</a>) that are useful to the
Dataflow Manager. The Flow Configuration History, which is available by
clicking on the clock icon ( <span class="image"><img
src="./images/iconFlowHistory.png" alt="Flow History" width="28"></span> ) in
the Management Toolbar, shows all the changes that have been made to the
dataflow graph. The history can aid in troubleshooting if a recent change to
the dataflow has caused a problem and needs to be fixed. While NiFi does not
have an "undo" feature, the DataFlow Manager can make new changes to the
dataflow that will fix the problem.</p>
+</div>
+<div class="paragraph">
+<p>Two other tools in the Management Toolbar are used primarily by
Administrators. These are the Flow Settings page ( <span class="image"><img
src="./images/iconSettings.png" alt="Flow Settings" width="28"></span> ) and
the Users page ( <span class="image"><img src="./images/iconUsers.png"
alt="Users" width="28"></span> ). The Flow Settings page provides the ability
to change the name of the NiFi instance, add comments describing the NiFi
instance, set the maximum number of threads that are available to the
application, and create a back-up copy of the dataflow(s) currently on the
graph. The Users page is used to manage user access, which is described in the
Admin Guide.</p>
+</div>
+</div>
+</div>
</div>
<div id="footer">
<div id="footer-text">
-Last updated 2014-12-31 12:06:24 EST
+Last updated 2015-01-21 10:20:49 EST
</div>
</div>
</body>
-</html>
+</html>
\ No newline at end of file