Author: rgoers
Date: Thu Aug 23 21:01:04 2012
New Revision: 1376696

URL: http://svn.apache.org/viewvc?rev=1376696&view=rev
Log:
Keep user guide in synch

Modified:
    flume/site/trunk/content/sphinx/FlumeUserGuide.rst

Modified: flume/site/trunk/content/sphinx/FlumeUserGuide.rst
URL: 
http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/FlumeUserGuide.rst?rev=1376696&r1=1376695&r2=1376696&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/FlumeUserGuide.rst (original)
+++ flume/site/trunk/content/sphinx/FlumeUserGuide.rst Thu Aug 23 21:01:04 2012
@@ -30,7 +30,7 @@ different sources to a centralized data 
 
 Apache Flume is a top level project at the Apache Software Foundation.
 There are currently two release code lines available, versions 0.9.x and 1.x.
-This documentation applies to the 1.x codeline.  
+This documentation applies to the 1.x codeline.
 Please click here for
 `the Flume 0.9.x User Guide 
<http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
 
@@ -155,7 +155,7 @@ A simple example
 Here, we give an example configuration file, describing a single-node Flume 
deployment. This configuration lets a user generate events and subsequently 
logs them to the console.
 
 .. code-block:: properties
-   
+
   # example.conf: A single-node Flume configuration
 
   # Name the components on this agent
@@ -175,7 +175,7 @@ Here, we give an example configuration f
   agent1.channels.channel1.type = memory
   agent1.channels.channel1.capacity = 1000
   agent1.channels.channel1.transactionCapactiy = 100
- 
+
   # Bind the source and sink to the channel
   agent1.sources.source1.channels = channel1
   agent1.sinks.sink1.channel = channel1
@@ -643,7 +643,7 @@ interceptors.*
              of indicating to the application writing the log file that it 
needs to
              retain the log or that the event hasn't been sent, for some 
reason. If
              this doesn't make sense, you need only know this: Your 
application can
-             never guarantee data has been received when using a 
unidirectional 
+             never guarantee data has been received when using a unidirectional
              asynchronous interface such as ExecSource! As an extension of this
              warning - and to be completely clear - there is absolutely zero 
guarantee
              of event delivery when using this source. You have been warned.
@@ -894,6 +894,33 @@ Example for agent named **agent_foo**:
   agent_foo.channels = memoryChannel-1
   agent_foo.sources.legacysource-1.type = your.namespace.YourClass
   agent_foo.sources.legacysource-1.channels = memoryChannel-1
+  
+Scribe Source
+~~~~~~~~~~~~~
+
+Scribe is another type of ingest system. To adopt existing Scribe ingest 
system, 
+Flume should use ScribeSource based on Thrift with compatible transfering 
protocol.
+The deployment of Scribe please following guide from Facebook.
+Required properties are in **bold**.
+
+==============  ===========  ==============================================
+Property Name   Default      Description
+==============  ===========  ==============================================
+**type**        --           The component type name, needs to be 
``org.apache.flume.source.scribe.ScribeSource``
+port            1499         Port that Scribe should be connected
+workerThreads   5                       Handing threads number in Thrift
+==============  ===========  ==============================================
+
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = scribesource-1
+  agent_foo.channels = memoryChannel-1
+  agent_foo.sources.scribesource-1.type = 
org.apache.flume.source.scribe.ScribeSource
+  agent_foo.sources.scribesource-1.port = 1463
+  agent_foo.sources.scribesource-1.workerThreads = 5
+  agent_foo.sources.scribesource-1.channels = memoryChannel-1
 
 Flume Sinks
 -----------
@@ -1100,15 +1127,15 @@ File Roll Sink
 Stores events on the local filesystem.
 Required properties are in **bold**.
 
-=================  =======  
======================================================================================================================
-Property Name      Default  Description
-=================  =======  
======================================================================================================================
-**channel**        --
-**type**           --       The component type name, needs to be ``FILE_ROLL``.
-sink.directory     --
-sink.rollInterval  30       Roll the file every 30 seconds. Specifying 0 will 
disable rolling and cause all events to be written to a single file.
-sink.serializer    TEXT     Other possible options include AVRO_EVENT or the 
FQCN of an implementation of EventSerializer.Builder interface.
-=================  =======  
======================================================================================================================
+===================  =======  
======================================================================================================================
+Property Name        Default  Description
+===================  =======  
======================================================================================================================
+**channel**          --
+**type**             --       The component type name, needs to be 
``FILE_ROLL``.
+**sink.directory**   --       The directory where files will be stored
+sink.rollInterval    30       Roll the file every 30 seconds. Specifying 0 
will disable rolling and cause all events to be written to a single file.
+sink.serializer      TEXT     Other possible options include AVRO_EVENT or the 
FQCN of an implementation of EventSerializer.Builder interface.
+===================  =======  
======================================================================================================================
 
 Example for agent named **agent_foo**:
 
@@ -1204,17 +1231,19 @@ This sink is still experimental.
 The type is the FQCN: org.apache.flume.sink.hbase.AsyncHBaseSink.
 Required properties are in **bold**.
 
-================  ============================================================ 
 =============================================================================
+================  ============================================================ 
 
====================================================================================
 Property Name     Default                                                      
 Description
-================  ============================================================ 
 =============================================================================
+================  ============================================================ 
 
====================================================================================
 **channel**       --
 **type**          --                                                           
 The component type name, needs to be ``org.apache.flume.sink.AsyncHBaseSink``
 **table**         --                                                           
 The name of the table in Hbase to write to.
 **columnFamily**  --                                                           
 The column family in Hbase to write to.
 batchSize         100                                                          
 Number of events to be written per txn.
+timeout           --                                                           
 The length of time (in milliseconds) the sink waits for acks from hbase for
+                                                                               
 all events in a transaction. If no timeout is specified, the sink will wait 
forever.
 serializer        org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
 serializer.*      --                                                           
 Properties to be passed to the serializer.
-================  ============================================================ 
 =============================================================================
+================  ============================================================ 
 
====================================================================================
 
 Example for agent named **agent_foo**:
 
@@ -1361,8 +1390,8 @@ keep-alive            3                 
 write-timeout         3                                 Amount of time (in 
sec) to wait for a write operation
 ====================  ================================  
========================================================
 
-.. note:: By default the File Channel uses paths for checkpoint and data 
-          directories that are within the user home as specified above. 
+.. note:: By default the File Channel uses paths for checkpoint and data
+          directories that are within the user home as specified above.
           As a result if you have more than one File Channel instances
           active within the agent, only one will be able to lock the
           directories and cause the other channel initialization to fail.
@@ -1649,10 +1678,21 @@ can preserve an existing timestamp if it
 ================  =======  
========================================================================
 Property Name     Default  Description
 ================  =======  
========================================================================
-type              --       The component type name, has to be ``TIMESTAMP``
+**type**          --       The component type name, has to be ``TIMESTAMP``
 preserveExisting  false    If the timestamp already exists, should it be 
preserved - true or false
 ================  =======  
========================================================================
 
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = source1
+  agent_foo.channels = channel1
+  agent_foo.sources.source1.channels =  channel1
+  agent_foo.sources.source1.type = SEQ
+  agent_foo.sources.source1.interceptors = inter1
+  agent_foo.sources.source1.interceptors.inter1.type = timestamp
+
 Host Interceptor
 ~~~~~~~~~~~~~~~~
 
@@ -1662,14 +1702,64 @@ with key ``host`` or a configured key wh
 ================  =======  
========================================================================
 Property Name     Default  Description
 ================  =======  
========================================================================
-type              --       The component type name, has to be ``HOST``
+**type**          --       The component type name, has to be ``HOST``
 preserveExisting  false    If the host header already exists, should it be 
preserved - true or false
 useIP             true     Use the IP Address if true, else use hostname.
 hostHeader        host     The header key to be used.
 ================  =======  
========================================================================
 
-In the example above, the key used in the event headers is "hostname"
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = source_foo
+  agent_foo.channels = channel-1
+  agent_foo.sources.source_foo.interceptors = host_int
+  agent_foo.sources.source_foo.interceptors.host_int.type = host
+  agent_foo.sources.source_foo.interceptors.host_int.hostHeader = hostname
+
+Static Interceptor
+~~~~~~~~~~~~~~~~~~
+
+Static interceptor allows user to append a static header with static value to 
all events.
+
+The current implementation does not allow specifying multiple headers at one 
time. Instead user might chain
+multiple static interceptors each defining one static header.
+
+================  =======  
========================================================================
+Property Name     Default  Description
+================  =======  
========================================================================
+**type**          --       The component type name, has to be ``STATIC``
+preserveExisting  true     If configured header already exists, should it be 
preserved - true or false
+key               key      Name of header that should be created
+value             value    Static value that should be created
+================  =======  
========================================================================
+
+Example for agent named **agent_foo**:
+
+.. code-block:: properties
+
+  agent_foo.sources = source1
+  agent_foo.channels = channel1
+  agent_foo.sources.source1.channels =  channel1
+  agent_foo.sources.source1.type = SEQ
+  agent_foo.sources.source1.interceptors = inter1
+  agent_foo.sources.source1.interceptors.inter1.type = static
+  agent_foo.sources.source1.interceptors.inter1.key = datacenter
+  agent_foo.sources.source1.interceptors.inter1.value = NEW_YORK
+
+Regex Filtering Interceptor
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This interceptor filters events selectively by interpreting the event body as 
text and matching the text against a configured regular expression. The 
supplied regular expression can be used to include events or exclude events.
 
+================  =======  
========================================================================
+Property Name     Default  Description
+================  =======  
========================================================================
+**type**          --       The component type name has to be ``REGEX_FILTER``
+regex             ".*"     Regular expression for matching against events
+excludeRegex      false    If true, regex determines events to exclude, 
otherwise regex determines events to include.
+================  =======  
========================================================================
 
 Flume Properties
 ----------------
@@ -1685,7 +1775,6 @@ flume.called.from.service  --       If t
                                     -Dflume.called.from.service is enough)
 =========================  =======  
====================================================================
 
-
 Property: flume.called.from.service
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -1744,7 +1833,111 @@ configuring the HDFS sink Kerberos-relat
 Monitoring
 ==========
 
-TBD
+Monitoring in Flume is still a work in progress. Changes can happen very often.
+Several Flume components report metrics to the JMX platform MBean server. These
+metrics can be queried using Jconsole.
+
+Ganglia Reporting
+-----------------
+Flume can also report these metrics to
+Ganglia 3 or Ganglia 3.1 metanodes. To report metrics to Ganglia, a flume agent
+must be started with this support. The Flume agent has to be started by passing
+in the following parameters as system properties prefixed by 
``flume.monitoring.``,
+and can be specified in the flume-env.sh:
+
+=======================  =======  
=====================================================================================
+Property Name            Default  Description
+=======================  =======  
=====================================================================================
+**type**                 --       The component type name, has to be 
``GANGLIA``
+**hosts**                --       Comma separated list of ``hostname:port``
+pollInterval             60       Time, in seconds, between consecutive 
reporting to ganglia server
+isGanglia3               false    Ganglia server version is 3. By default, 
Flume sends in ganglia 3.1 format
+=======================  =======  
=====================================================================================
+
+We can start Flume with Ganglia support as follows::
+
+  $ bin/flume-ng agent --conf-file example.conf --name agent1 
-Dflume.monitoring.type=GANGLIA 
-Dflume.monitoring.hosts=com.example:1234,com.example2:5455
+
+Any custom flume components should use Java MBean ObjectNames which begin
+with ``org.apache.flume`` for Flume to report the metrics to Ganglia. This can
+be done by adding the ObjectName as follows(the name can be anything provided 
it
+starts with ``org.apache.flume``):
+
+.. code-block:: java
+
+  ObjectName objName = new ObjectName("org.apache.flume." + myClassName + 
":type=" + name);
+
+  ManagementFactory.getPlatformMBeanServer().registerMBean(this, objName);
+
+JSON Reporting
+--------------
+Flume can also report metrics in a JSON format. To enable reporting in JSON 
format, Flume hosts
+a Web server on a configurable port. Flume reports metrics in the following 
JSON format:
+
+.. code-block:: java
+
+  {
+  "typeName1.componentName1" : {"metric1" : "metricValue1", "metric2" : 
"metricValue2"},
+  "typeName2.componentName2" : {"metric3" : "metricValue3", "metric4" : 
"metricValue4"}
+  }
+
+Here is an example:
+
+.. code-block:: java
+
+  {
+  "CHANNEL.fileChannel":{"EventPutSuccessCount":"468085",
+                        "Type":"CHANNEL",
+                        "StopTime":"0",
+                        "EventPutAttemptCount":"468086",
+                        "ChannelSize":"233428",
+                        "StartTime":"1344882233070",
+                        "EventTakeSuccessCount":"458200",
+                        "ChannelCapacity":"600000",
+                        "EventTakeAttemptCount":"458288"},
+  "CHANNEL.memChannel":{"EventPutSuccessCount":"22948908",
+                     "Type":"CHANNEL",
+                     "StopTime":"0",
+                     "EventPutAttemptCount":"22948908",
+                     "ChannelSize":"5",
+                     "StartTime":"1344882209413",
+                     "EventTakeSuccessCount":"22948900",
+                     "ChannelCapacity":"100",
+                     "EventTakeAttemptCount":"22948908"}
+  }
+
+=======================  =======  
=====================================================================================
+Property Name            Default  Description
+=======================  =======  
=====================================================================================
+**type**                 --       The component type name, has to be ``HTTP``
+port                     41414    The port to start the server on.
+=======================  =======  
=====================================================================================
+
+We can start Flume with Ganglia support as follows::
+
+  $ bin/flume-ng agent --conf-file example.conf --name agent1 
-Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545
+
+Metrics will then be available at **http://<hostname>:<port>/metrics** webpage.
+Custom components can report metrics as mentioned in the Ganglia section above.
+
+Custom Reporting
+----------------
+It is possible to report metrics to other systems by writing servers that do
+the reporting. Any reporting class has to implement the interface,
+``org.apache.flume.instrumentation.MonitorService``. Such a class can be used
+the same way the GangliaServer is used for reporting. They can poll the 
platform
+mbean server to poll the mbeans for metrics. For example, if an HTTP
+monitoring service called ``HTTPReporting`` can be used as follows::
+
+  $ bin/flume-ng agent --conf-file example.conf --name agent1 
-Dflume.monitoring.type=com.example.reporting.HTTPReporting 
-Dflume.monitoring.node=com.example:332
+
+=======================  =======  ========================================
+Property Name            Default  Description
+=======================  =======  ========================================
+**type**                 --       The component type name, has to be FQCN
+=======================  =======  ========================================
+
+
 
 Troubleshooting
 ===============
@@ -1791,37 +1984,40 @@ TBD
 Component Summary
 =================
 
-================================  ==================  
====================================================================
-Component Interface               Type                Implementation Class
-================================  ==================  
====================================================================
-org.apache.flume.Channel          MEMORY              
org.apache.flume.channel.MemoryChannel
-org.apache.flume.Channel          JDBC                
org.apache.flume.channel.jdbc.JdbcChannel
-org.apache.flume.Channel          --                  
org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
-org.apache.flume.Channel          FILE                
org.apache.flume.channel.file.FileChannel
-org.apache.flume.Channel          --                  
org.apache.flume.channel.PseudoTxnMemoryChannel
-org.apache.flume.Channel          --                  org.example.MyChannel
-org.apache.flume.Source           AVRO
-org.apache.flume.Source           NETCAT
-org.apache.flume.Source           SEQ
-org.apache.flume.Source           EXEC
-org.apache.flume.Source           SYSLOGTCP
-org.apache.flume.Source           SYSLOGUDP
-org.apache.flume.Source           --                  
org.apache.flume.source.avroLegacy.AvroLegacySource
-org.apache.flume.Source           --                  
org.apache.flume.source.thriftLegacy.ThriftLegacySource
-org.apache.flume.Source           --                  org.example.MySource
-org.apache.flume.Sink             NULL                
org.apache.flume.sink.NullSink
-org.apache.flume.Sink             LOGGER              
org.apache.flume.sink.LoggerSink
-org.apache.flume.Sink             AVRO                
org.apache.flume.sink.AvroSink
-org.apache.flume.Sink             HDFS                
org.apache.flume.sink.hdfs.HDFSEventSink
-org.apache.flume.Sink             --                  
org.apache.flume.sink.hbase.HBaseSink
-org.apache.flume.Sink             --                  
org.apache.flume.sink.hbase.AsyncHBaseSink
-org.apache.flume.Sink             FILE_ROLL           
org.apache.flume.sink.RollingFileSink
-org.apache.flume.Sink             IRC                 
org.apache.flume.sink.irc.IRCSink
-org.apache.flume.Sink             --                  org.example.MySink
-org.apache.flume.ChannelSelector  REPLICATING         
org.apache.flume.channel.ReplicatingChannelSelector
-org.apache.flume.ChannelSelector  MULTIPLEXING        
org.apache.flume.channel.MultiplexingChannelSelector
-org.apache.flume.ChannelSelector  --                  
org.example.MyChannelSelector
-org.apache.flume.SinkProcessor    DEFAULT             
org.apache.flume.sink.DefaultSinkProcessor
-org.apache.flume.SinkProcessor    FAILOVER            
org.apache.flume.sink.FailoverSinkProcessor
-org.apache.flume.SinkProcessor    LOAD_BALANCE        
org.apache.flume.sink.LoadBalancingSinkProcessor
-================================  ==================  
====================================================================
+========================================  ==================  
====================================================================
+Component Interface                       Type                Implementation 
Class
+========================================  ==================  
====================================================================
+org.apache.flume.Channel                  MEMORY              
org.apache.flume.channel.MemoryChannel
+org.apache.flume.Channel                  JDBC                
org.apache.flume.channel.jdbc.JdbcChannel
+org.apache.flume.Channel                  --                  
org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
+org.apache.flume.Channel                  FILE                
org.apache.flume.channel.file.FileChannel
+org.apache.flume.Channel                  --                  
org.apache.flume.channel.PseudoTxnMemoryChannel
+org.apache.flume.Channel                  --                  
org.example.MyChannel
+org.apache.flume.Source                   AVRO                
org.apache.flume.source.AvroSource
+org.apache.flume.Source                   NETCAT              
org.apache.flume.source.NetcatSource
+org.apache.flume.Source                   SEQ                 
org.apache.flume.source.SequenceGeneratorSource
+org.apache.flume.Source                   EXEC                
org.apache.flume.source.ExecSource
+org.apache.flume.Source                   SYSLOGTCP           
org.apache.flume.source.SyslogTcpSource
+org.apache.flume.Source                   SYSLOGUDP           
org.apache.flume.source.SyslogUDPSource
+org.apache.flume.Source                   --                  
org.apache.flume.source.avroLegacy.AvroLegacySource
+org.apache.flume.Source                   --                  
org.apache.flume.source.thriftLegacy.ThriftLegacySource
+org.apache.flume.Source                   --                  
org.example.MySource
+org.apache.flume.Sink                     NULL                
org.apache.flume.sink.NullSink
+org.apache.flume.Sink                     LOGGER              
org.apache.flume.sink.LoggerSink
+org.apache.flume.Sink                     AVRO                
org.apache.flume.sink.AvroSink
+org.apache.flume.Sink                     HDFS                
org.apache.flume.sink.hdfs.HDFSEventSink
+org.apache.flume.Sink                     --                  
org.apache.flume.sink.hbase.HBaseSink
+org.apache.flume.Sink                     --                  
org.apache.flume.sink.hbase.AsyncHBaseSink
+org.apache.flume.Sink                     FILE_ROLL           
org.apache.flume.sink.RollingFileSink
+org.apache.flume.Sink                     IRC                 
org.apache.flume.sink.irc.IRCSink
+org.apache.flume.Sink                     --                  
org.example.MySink
+org.apache.flume.ChannelSelector          REPLICATING         
org.apache.flume.channel.ReplicatingChannelSelector
+org.apache.flume.ChannelSelector          MULTIPLEXING        
org.apache.flume.channel.MultiplexingChannelSelector
+org.apache.flume.ChannelSelector          --                  
org.example.MyChannelSelector
+org.apache.flume.SinkProcessor            DEFAULT             
org.apache.flume.sink.DefaultSinkProcessor
+org.apache.flume.SinkProcessor            FAILOVER            
org.apache.flume.sink.FailoverSinkProcessor
+org.apache.flume.SinkProcessor            LOAD_BALANCE        
org.apache.flume.sink.LoadBalancingSinkProcessor
+org.apache.flume.interceptor.Interceptor  TIMESTAMP           
org.apache.flume.interceptor.TimestampInterceptor$Builder
+org.apache.flume.interceptor.Interceptor  HOST                
org.apache.flume.interceptor.HostInterceptor$Builder
+org.apache.flume.interceptor.Interceptor  STATIC              
org.apache.flume.interceptor.StaticInterceptor$Builder
+========================================  ==================  
====================================================================


Reply via email to