Author: rgoers
Date: Thu Aug 23 21:01:04 2012
New Revision: 1376696
URL: http://svn.apache.org/viewvc?rev=1376696&view=rev
Log:
Keep user guide in synch
Modified:
flume/site/trunk/content/sphinx/FlumeUserGuide.rst
Modified: flume/site/trunk/content/sphinx/FlumeUserGuide.rst
URL:
http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/FlumeUserGuide.rst?rev=1376696&r1=1376695&r2=1376696&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/FlumeUserGuide.rst (original)
+++ flume/site/trunk/content/sphinx/FlumeUserGuide.rst Thu Aug 23 21:01:04 2012
@@ -30,7 +30,7 @@ different sources to a centralized data
Apache Flume is a top level project at the Apache Software Foundation.
There are currently two release code lines available, versions 0.9.x and 1.x.
-This documentation applies to the 1.x codeline.
+This documentation applies to the 1.x codeline.
Please click here for
`the Flume 0.9.x User Guide
<http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
@@ -155,7 +155,7 @@ A simple example
Here, we give an example configuration file, describing a single-node Flume
deployment. This configuration lets a user generate events and subsequently
logs them to the console.
.. code-block:: properties
-
+
# example.conf: A single-node Flume configuration
# Name the components on this agent
@@ -175,7 +175,7 @@ Here, we give an example configuration f
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapactiy = 100
-
+
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
@@ -643,7 +643,7 @@ interceptors.*
of indicating to the application writing the log file that it
needs to
retain the log or that the event hasn't been sent, for some
reason. If
this doesn't make sense, you need only know this: Your
application can
- never guarantee data has been received when using a
unidirectional
+ never guarantee data has been received when using a unidirectional
asynchronous interface such as ExecSource! As an extension of this
warning - and to be completely clear - there is absolutely zero
guarantee
of event delivery when using this source. You have been warned.
@@ -894,6 +894,33 @@ Example for agent named **agent_foo**:
agent_foo.channels = memoryChannel-1
agent_foo.sources.legacysource-1.type = your.namespace.YourClass
agent_foo.sources.legacysource-1.channels = memoryChannel-1
+
+Scribe Source
+~~~~~~~~~~~~~
+
+Scribe is another type of ingest system. To adopt existing Scribe ingest
system,
+Flume should use ScribeSource based on Thrift with compatible transfering
protocol.
+The deployment of Scribe please following guide from Facebook.
+Required properties are in **bold**.
+
+============== =========== ==============================================
+Property Name Default Description
+============== =========== ==============================================
+**type** -- The component type name, needs to be
``org.apache.flume.source.scribe.ScribeSource``
+port 1499 Port that Scribe should be connected
+workerThreads 5 Handing threads number in Thrift
+============== =========== ==============================================
+
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+ agent_foo.sources = scribesource-1
+ agent_foo.channels = memoryChannel-1
+ agent_foo.sources.scribesource-1.type =
org.apache.flume.source.scribe.ScribeSource
+ agent_foo.sources.scribesource-1.port = 1463
+ agent_foo.sources.scribesource-1.workerThreads = 5
+ agent_foo.sources.scribesource-1.channels = memoryChannel-1
Flume Sinks
-----------
@@ -1100,15 +1127,15 @@ File Roll Sink
Stores events on the local filesystem.
Required properties are in **bold**.
-================= =======
======================================================================================================================
-Property Name Default Description
-================= =======
======================================================================================================================
-**channel** --
-**type** -- The component type name, needs to be ``FILE_ROLL``.
-sink.directory --
-sink.rollInterval 30 Roll the file every 30 seconds. Specifying 0 will
disable rolling and cause all events to be written to a single file.
-sink.serializer TEXT Other possible options include AVRO_EVENT or the
FQCN of an implementation of EventSerializer.Builder interface.
-================= =======
======================================================================================================================
+=================== =======
======================================================================================================================
+Property Name Default Description
+=================== =======
======================================================================================================================
+**channel** --
+**type** -- The component type name, needs to be
``FILE_ROLL``.
+**sink.directory** -- The directory where files will be stored
+sink.rollInterval 30 Roll the file every 30 seconds. Specifying 0
will disable rolling and cause all events to be written to a single file.
+sink.serializer TEXT Other possible options include AVRO_EVENT or the
FQCN of an implementation of EventSerializer.Builder interface.
+=================== =======
======================================================================================================================
Example for agent named **agent_foo**:
@@ -1204,17 +1231,19 @@ This sink is still experimental.
The type is the FQCN: org.apache.flume.sink.hbase.AsyncHBaseSink.
Required properties are in **bold**.
-================ ============================================================
=============================================================================
+================ ============================================================
====================================================================================
Property Name Default
Description
-================ ============================================================
=============================================================================
+================ ============================================================
====================================================================================
**channel** --
**type** --
The component type name, needs to be ``org.apache.flume.sink.AsyncHBaseSink``
**table** --
The name of the table in Hbase to write to.
**columnFamily** --
The column family in Hbase to write to.
batchSize 100
Number of events to be written per txn.
+timeout --
The length of time (in milliseconds) the sink waits for acks from hbase for
+
all events in a transaction. If no timeout is specified, the sink will wait
forever.
serializer org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
serializer.* --
Properties to be passed to the serializer.
-================ ============================================================
=============================================================================
+================ ============================================================
====================================================================================
Example for agent named **agent_foo**:
@@ -1361,8 +1390,8 @@ keep-alive 3
write-timeout 3 Amount of time (in
sec) to wait for a write operation
==================== ================================
========================================================
-.. note:: By default the File Channel uses paths for checkpoint and data
- directories that are within the user home as specified above.
+.. note:: By default the File Channel uses paths for checkpoint and data
+ directories that are within the user home as specified above.
As a result if you have more than one File Channel instances
active within the agent, only one will be able to lock the
directories and cause the other channel initialization to fail.
@@ -1649,10 +1678,21 @@ can preserve an existing timestamp if it
================ =======
========================================================================
Property Name Default Description
================ =======
========================================================================
-type -- The component type name, has to be ``TIMESTAMP``
+**type** -- The component type name, has to be ``TIMESTAMP``
preserveExisting false If the timestamp already exists, should it be
preserved - true or false
================ =======
========================================================================
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+ agent_foo.sources = source1
+ agent_foo.channels = channel1
+ agent_foo.sources.source1.channels = channel1
+ agent_foo.sources.source1.type = SEQ
+ agent_foo.sources.source1.interceptors = inter1
+ agent_foo.sources.source1.interceptors.inter1.type = timestamp
+
Host Interceptor
~~~~~~~~~~~~~~~~
@@ -1662,14 +1702,64 @@ with key ``host`` or a configured key wh
================ =======
========================================================================
Property Name Default Description
================ =======
========================================================================
-type -- The component type name, has to be ``HOST``
+**type** -- The component type name, has to be ``HOST``
preserveExisting false If the host header already exists, should it be
preserved - true or false
useIP true Use the IP Address if true, else use hostname.
hostHeader host The header key to be used.
================ =======
========================================================================
-In the example above, the key used in the event headers is "hostname"
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+ agent_foo.sources = source_foo
+ agent_foo.channels = channel-1
+ agent_foo.sources.source_foo.interceptors = host_int
+ agent_foo.sources.source_foo.interceptors.host_int.type = host
+ agent_foo.sources.source_foo.interceptors.host_int.hostHeader = hostname
+
+Static Interceptor
+~~~~~~~~~~~~~~~~~~
+
+Static interceptor allows user to append a static header with static value to
all events.
+
+The current implementation does not allow specifying multiple headers at one
time. Instead user might chain
+multiple static interceptors each defining one static header.
+
+================ =======
========================================================================
+Property Name Default Description
+================ =======
========================================================================
+**type** -- The component type name, has to be ``STATIC``
+preserveExisting true If configured header already exists, should it be
preserved - true or false
+key key Name of header that should be created
+value value Static value that should be created
+================ =======
========================================================================
+
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+ agent_foo.sources = source1
+ agent_foo.channels = channel1
+ agent_foo.sources.source1.channels = channel1
+ agent_foo.sources.source1.type = SEQ
+ agent_foo.sources.source1.interceptors = inter1
+ agent_foo.sources.source1.interceptors.inter1.type = static
+ agent_foo.sources.source1.interceptors.inter1.key = datacenter
+ agent_foo.sources.source1.interceptors.inter1.value = NEW_YORK
+
+Regex Filtering Interceptor
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This interceptor filters events selectively by interpreting the event body as
text and matching the text against a configured regular expression. The
supplied regular expression can be used to include events or exclude events.
+================ =======
========================================================================
+Property Name Default Description
+================ =======
========================================================================
+**type** -- The component type name has to be ``REGEX_FILTER``
+regex ".*" Regular expression for matching against events
+excludeRegex false If true, regex determines events to exclude,
otherwise regex determines events to include.
+================ =======
========================================================================
Flume Properties
----------------
@@ -1685,7 +1775,6 @@ flume.called.from.service -- If t
-Dflume.called.from.service is enough)
========================= =======
====================================================================
-
Property: flume.called.from.service
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1744,7 +1833,111 @@ configuring the HDFS sink Kerberos-relat
Monitoring
==========
-TBD
+Monitoring in Flume is still a work in progress. Changes can happen very often.
+Several Flume components report metrics to the JMX platform MBean server. These
+metrics can be queried using Jconsole.
+
+Ganglia Reporting
+-----------------
+Flume can also report these metrics to
+Ganglia 3 or Ganglia 3.1 metanodes. To report metrics to Ganglia, a flume agent
+must be started with this support. The Flume agent has to be started by passing
+in the following parameters as system properties prefixed by
``flume.monitoring.``,
+and can be specified in the flume-env.sh:
+
+======================= =======
=====================================================================================
+Property Name Default Description
+======================= =======
=====================================================================================
+**type** -- The component type name, has to be
``GANGLIA``
+**hosts** -- Comma separated list of ``hostname:port``
+pollInterval 60 Time, in seconds, between consecutive
reporting to ganglia server
+isGanglia3 false Ganglia server version is 3. By default,
Flume sends in ganglia 3.1 format
+======================= =======
=====================================================================================
+
+We can start Flume with Ganglia support as follows::
+
+ $ bin/flume-ng agent --conf-file example.conf --name agent1
-Dflume.monitoring.type=GANGLIA
-Dflume.monitoring.hosts=com.example:1234,com.example2:5455
+
+Any custom flume components should use Java MBean ObjectNames which begin
+with ``org.apache.flume`` for Flume to report the metrics to Ganglia. This can
+be done by adding the ObjectName as follows(the name can be anything provided
it
+starts with ``org.apache.flume``):
+
+.. code-block:: java
+
+ ObjectName objName = new ObjectName("org.apache.flume." + myClassName +
":type=" + name);
+
+ ManagementFactory.getPlatformMBeanServer().registerMBean(this, objName);
+
+JSON Reporting
+--------------
+Flume can also report metrics in a JSON format. To enable reporting in JSON
format, Flume hosts
+a Web server on a configurable port. Flume reports metrics in the following
JSON format:
+
+.. code-block:: java
+
+ {
+ "typeName1.componentName1" : {"metric1" : "metricValue1", "metric2" :
"metricValue2"},
+ "typeName2.componentName2" : {"metric3" : "metricValue3", "metric4" :
"metricValue4"}
+ }
+
+Here is an example:
+
+.. code-block:: java
+
+ {
+ "CHANNEL.fileChannel":{"EventPutSuccessCount":"468085",
+ "Type":"CHANNEL",
+ "StopTime":"0",
+ "EventPutAttemptCount":"468086",
+ "ChannelSize":"233428",
+ "StartTime":"1344882233070",
+ "EventTakeSuccessCount":"458200",
+ "ChannelCapacity":"600000",
+ "EventTakeAttemptCount":"458288"},
+ "CHANNEL.memChannel":{"EventPutSuccessCount":"22948908",
+ "Type":"CHANNEL",
+ "StopTime":"0",
+ "EventPutAttemptCount":"22948908",
+ "ChannelSize":"5",
+ "StartTime":"1344882209413",
+ "EventTakeSuccessCount":"22948900",
+ "ChannelCapacity":"100",
+ "EventTakeAttemptCount":"22948908"}
+ }
+
+======================= =======
=====================================================================================
+Property Name Default Description
+======================= =======
=====================================================================================
+**type** -- The component type name, has to be ``HTTP``
+port 41414 The port to start the server on.
+======================= =======
=====================================================================================
+
+We can start Flume with Ganglia support as follows::
+
+ $ bin/flume-ng agent --conf-file example.conf --name agent1
-Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545
+
+Metrics will then be available at **http://<hostname>:<port>/metrics** webpage.
+Custom components can report metrics as mentioned in the Ganglia section above.
+
+Custom Reporting
+----------------
+It is possible to report metrics to other systems by writing servers that do
+the reporting. Any reporting class has to implement the interface,
+``org.apache.flume.instrumentation.MonitorService``. Such a class can be used
+the same way the GangliaServer is used for reporting. They can poll the
platform
+mbean server to poll the mbeans for metrics. For example, if an HTTP
+monitoring service called ``HTTPReporting`` can be used as follows::
+
+ $ bin/flume-ng agent --conf-file example.conf --name agent1
-Dflume.monitoring.type=com.example.reporting.HTTPReporting
-Dflume.monitoring.node=com.example:332
+
+======================= ======= ========================================
+Property Name Default Description
+======================= ======= ========================================
+**type** -- The component type name, has to be FQCN
+======================= ======= ========================================
+
+
Troubleshooting
===============
@@ -1791,37 +1984,40 @@ TBD
Component Summary
=================
-================================ ==================
====================================================================
-Component Interface Type Implementation Class
-================================ ==================
====================================================================
-org.apache.flume.Channel MEMORY
org.apache.flume.channel.MemoryChannel
-org.apache.flume.Channel JDBC
org.apache.flume.channel.jdbc.JdbcChannel
-org.apache.flume.Channel --
org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
-org.apache.flume.Channel FILE
org.apache.flume.channel.file.FileChannel
-org.apache.flume.Channel --
org.apache.flume.channel.PseudoTxnMemoryChannel
-org.apache.flume.Channel -- org.example.MyChannel
-org.apache.flume.Source AVRO
-org.apache.flume.Source NETCAT
-org.apache.flume.Source SEQ
-org.apache.flume.Source EXEC
-org.apache.flume.Source SYSLOGTCP
-org.apache.flume.Source SYSLOGUDP
-org.apache.flume.Source --
org.apache.flume.source.avroLegacy.AvroLegacySource
-org.apache.flume.Source --
org.apache.flume.source.thriftLegacy.ThriftLegacySource
-org.apache.flume.Source -- org.example.MySource
-org.apache.flume.Sink NULL
org.apache.flume.sink.NullSink
-org.apache.flume.Sink LOGGER
org.apache.flume.sink.LoggerSink
-org.apache.flume.Sink AVRO
org.apache.flume.sink.AvroSink
-org.apache.flume.Sink HDFS
org.apache.flume.sink.hdfs.HDFSEventSink
-org.apache.flume.Sink --
org.apache.flume.sink.hbase.HBaseSink
-org.apache.flume.Sink --
org.apache.flume.sink.hbase.AsyncHBaseSink
-org.apache.flume.Sink FILE_ROLL
org.apache.flume.sink.RollingFileSink
-org.apache.flume.Sink IRC
org.apache.flume.sink.irc.IRCSink
-org.apache.flume.Sink -- org.example.MySink
-org.apache.flume.ChannelSelector REPLICATING
org.apache.flume.channel.ReplicatingChannelSelector
-org.apache.flume.ChannelSelector MULTIPLEXING
org.apache.flume.channel.MultiplexingChannelSelector
-org.apache.flume.ChannelSelector --
org.example.MyChannelSelector
-org.apache.flume.SinkProcessor DEFAULT
org.apache.flume.sink.DefaultSinkProcessor
-org.apache.flume.SinkProcessor FAILOVER
org.apache.flume.sink.FailoverSinkProcessor
-org.apache.flume.SinkProcessor LOAD_BALANCE
org.apache.flume.sink.LoadBalancingSinkProcessor
-================================ ==================
====================================================================
+======================================== ==================
====================================================================
+Component Interface Type Implementation
Class
+======================================== ==================
====================================================================
+org.apache.flume.Channel MEMORY
org.apache.flume.channel.MemoryChannel
+org.apache.flume.Channel JDBC
org.apache.flume.channel.jdbc.JdbcChannel
+org.apache.flume.Channel --
org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
+org.apache.flume.Channel FILE
org.apache.flume.channel.file.FileChannel
+org.apache.flume.Channel --
org.apache.flume.channel.PseudoTxnMemoryChannel
+org.apache.flume.Channel --
org.example.MyChannel
+org.apache.flume.Source AVRO
org.apache.flume.source.AvroSource
+org.apache.flume.Source NETCAT
org.apache.flume.source.NetcatSource
+org.apache.flume.Source SEQ
org.apache.flume.source.SequenceGeneratorSource
+org.apache.flume.Source EXEC
org.apache.flume.source.ExecSource
+org.apache.flume.Source SYSLOGTCP
org.apache.flume.source.SyslogTcpSource
+org.apache.flume.Source SYSLOGUDP
org.apache.flume.source.SyslogUDPSource
+org.apache.flume.Source --
org.apache.flume.source.avroLegacy.AvroLegacySource
+org.apache.flume.Source --
org.apache.flume.source.thriftLegacy.ThriftLegacySource
+org.apache.flume.Source --
org.example.MySource
+org.apache.flume.Sink NULL
org.apache.flume.sink.NullSink
+org.apache.flume.Sink LOGGER
org.apache.flume.sink.LoggerSink
+org.apache.flume.Sink AVRO
org.apache.flume.sink.AvroSink
+org.apache.flume.Sink HDFS
org.apache.flume.sink.hdfs.HDFSEventSink
+org.apache.flume.Sink --
org.apache.flume.sink.hbase.HBaseSink
+org.apache.flume.Sink --
org.apache.flume.sink.hbase.AsyncHBaseSink
+org.apache.flume.Sink FILE_ROLL
org.apache.flume.sink.RollingFileSink
+org.apache.flume.Sink IRC
org.apache.flume.sink.irc.IRCSink
+org.apache.flume.Sink --
org.example.MySink
+org.apache.flume.ChannelSelector REPLICATING
org.apache.flume.channel.ReplicatingChannelSelector
+org.apache.flume.ChannelSelector MULTIPLEXING
org.apache.flume.channel.MultiplexingChannelSelector
+org.apache.flume.ChannelSelector --
org.example.MyChannelSelector
+org.apache.flume.SinkProcessor DEFAULT
org.apache.flume.sink.DefaultSinkProcessor
+org.apache.flume.SinkProcessor FAILOVER
org.apache.flume.sink.FailoverSinkProcessor
+org.apache.flume.SinkProcessor LOAD_BALANCE
org.apache.flume.sink.LoadBalancingSinkProcessor
+org.apache.flume.interceptor.Interceptor TIMESTAMP
org.apache.flume.interceptor.TimestampInterceptor$Builder
+org.apache.flume.interceptor.Interceptor HOST
org.apache.flume.interceptor.HostInterceptor$Builder
+org.apache.flume.interceptor.Interceptor STATIC
org.apache.flume.interceptor.StaticInterceptor$Builder
+======================================== ==================
====================================================================