[1/2] samza git commit: Samza config reference

jagadish Fri, 05 Oct 2018 11:02:02 -0700

Repository: samza
Updated Branches:
  refs/heads/master b101ff91a -> 6b3189436



http://git-wip-us.apache.org/repos/asf/samza/blob/6b318943/docs/learn/documentation/versioned/jobs/samza-configurations.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/samza-configurations.md 
b/docs/learn/documentation/versioned/jobs/samza-configurations.md
new file mode 100644
index 0000000..2a38134
--- /dev/null
+++ b/docs/learn/documentation/versioned/jobs/samza-configurations.md
@@ -0,0 +1,335 @@
+---
+layout: page
+title: Samza Configurations
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+The following table lists the complete set of properties that can be included 
in a Samza job configuration file.<br>
+
+* [1. Application Configurations](#application-configurations)
+  + [1.1 Advanced Application 
Configurations](#advanced-application-configurations)
+* [2. Checkpointing](#checkpointing)
+  + [2.1 Advanced Checkpointing 
Configurations](#advanced-checkpointing-configuration)
+* [3. Systems & Streams](#systems-streams)
+  + [3.1 Advanced System & Stream 
Configuration](#advanced-system-stream-configurations)
+  + [3.2 Kafka](#kafka)
+  + [3.3 HDFS](#hdfs)
+  + [3.4 EventHubs](#eventhubs)
+  + [3.5 Kinesis](#kinesis)
+  + [3.6 ElasticSearch](#elasticsearch)
+* [4. State Storage](#state-storage)
+  + [4.1 Advanced Storage Configurations](#advanced-storage-configurations)
+* [5. Deployment](#deployment)
+  + [5.1 YARN Cluster Deployment](#yarn-cluster-deployment)
+      - [5.1.1 Advanced Cluster 
Configurations](#advanced-cluster-configurations)
+  + [5.2 Standalone Deployment](#standalone-deployment)
+      - [5.2.1 Advanced Standalone 
Configurations](#advanced-standalone-configurations)
+* [6. Metrics](#metrics)
+
+### <a name="application-configurations"></a> [1. Application 
Configurations](#application-configurations)
+These are the basic properties for setting up a Samza application.
+
+|Name|Default|Description|
+|--- |--- |--- |
+|app.name| |__Required:__ The name of your application.|
+|app.id|1|If you run several instances of your application at the same time, 
you need to give each instance a different app.id. This is important, since 
otherwise the applications will overwrite each others' checkpoints, and perhaps 
interfere with each other in other ways.|
+|app.class| |This is __required if running on YARN__. The application to run. 
The value is a fully-qualified Java classname, which must implement 
StreamApplication. A StreamApplication describes as a series of transformations 
on the streams.|
+|job.factory.class| |This is __required if running on YARN__. The job factory 
to use for running this job. <br> The value is a fully-qualified Java 
classname, which must implement StreamJobFactory.<br> Samza ships with three 
implementations:<br><br>`org.apache.samza.job.yarn.YarnJobFactory`<br>Runs your 
job on a YARN grid. See below for YARN-specific 
configuration.<br><br>`org.apache.samza.job.local.ThreadJobFactory`<br>__For 
dev deployments only.__ Runs your job on your local machine using 
threads.<br><br>`org.apache.samza.job.local.ProcessJobFactory`<br>__For dev 
deployments only.__ Runs your job on your local machine as a subprocess. An 
optional command builderproperty can also be specified (see task.command.class 
for details).|
+|job.name| |__Required:__ The name of your job. This name appears on the Samza 
dashboard, and it is used to tell apart this job's checkpoints from other jobs' 
checkpoints.|
+|job.id|1|If you run several instances of your job at the same time, you need 
to give each execution a different job.id. This is important, since otherwise 
the jobs will overwrite each others' checkpoints, and perhaps interfere with 
each other in other ways.|
+|job.default.system| |__Required:__ The system-name to use for creating input 
or output streams for which the system is not explicitly configured. This 
property will also be used as default for `job.coordinator.system`, 
`task.checkpoint.system` and `job.changelog.system` if none are defined.|
+|task.class| |Used for legacy purposes; replace with `app.class` in new jobs. 
The fully-qualified name of the Java class which processes incoming messages 
from input streams. The class must implement 
[StreamTask](../api/javadocs/org/apache/samza/task/StreamTask.html) or 
[AsyncStreamTask](../api/javadocs/org/apache/samza/task/AsyncStreamTask.html), 
and may optionally implement 
[InitableTask](../api/javadocs/org/apache/samza/task/InitableTask.html), 
[ClosableTask](../api/javadocs/org/apache/samza/task/ClosableTask.html) and/or 
[WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html). 
The class will be instantiated several times, once for every input stream 
partition.|
+|job.host-affinity.enabled|false|This property indicates whether host-affinity 
is enabled or not. Host-affinity refers to the ability of Samza to request and 
allocate a container on the same host every time the job is deployed. When 
host-affinity is enabled, Samza makes a "best-effort" to honor the 
host-affinity constraint. The property 
`cluster-manager.container.request.timeout.ms` determines how long to wait 
before de-prioritizing the host-affinity constraint and assigning the container 
to any available resource.|
+|task.window.ms|-1|If task.class implements 
[WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html), it 
can receive a windowing callback in regular intervals. This property specifies 
the time between window() calls, in milliseconds. If the number is negative 
(the default), window() is never called. A `window()` call will never  occur 
concurrently with the processing of a message. If a message is being processed 
when a window() call is due, the invocation of window happens after processing 
the message. This property is set automatically when using join or window 
operators in a High Level API StreamApplication Note: task.window.ms should be 
set to be much larger than average process or window call duration to avoid 
starving regular processing.|
+|task.log4j.system| |Specify the system name for the StreamAppender. If this 
property is not specified in the config, an exception will be thrown. (See 
[Stream Log4j Appender](logging.html#stream-log4j-appender)) Example: 
task.log4j.system=kafka|
+|serializers.registry.<br>**_serde-name_**.class| |Use this property to 
register a serializer/deserializer, which defines a way of encoding data as an 
array of bytes (used for messages in streams, and for data in persistent 
storage). You can give a serde any serde-name you want, and reference that name 
in properties like systems.\*.samza.key.serde, systems.\*.samza.msg.serde, 
streams.\*.samza.key.serde, streams.\*.samza.msg.serde, stores.\*.key.serde and 
stores.\*.msg.serde. The value of this property is the fully-qualified name of 
a Java class that implements SerdeFactory. Samza ships with the following serde 
implementations, which can be used with their predefined serde name without 
adding them to the registry 
explicitly:<br><br>`org.apache.samza.serializers.ByteSerdeFactory`<br>A no-op 
serde which passes through the undecoded byte array. Its predefined serde-name 
is 
`byte`.<br><br>`org.apache.samza.serializers.ByteBufferSerdeFactory`<br>Encodes 
`java.nio.ByteBuffer` objects. Its 
 predefined serde-name is 
`bytebuffer`.<br><br>`org.apache.samza.serializers.IntegerSerdeFactory`<br>Encodes
 `java.lang.Integer` objects as binary (4 bytes fixed-length big-endian 
encoding). Its predefined serde-name is 
`integer`.<br><br>`org.apache.samza.serializers.StringSerdeFactory`<br>Encodes 
`java.lang.String` objects as UTF-8. Its predefined serde-name is 
`string`.<br><br>`org.apache.samza.serializers.JsonSerdeFactory`<br>Encodes 
nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This 
Serde enforces a dash-separated property naming convention, while JsonSerdeV2 
doesn't. This serde is primarily meant for Samza's internal usage, and is 
publicly available for backwards compatibility. Its predefined serde-name is 
`json`.<br><br>`org.apache.samza.serializers.JsonSerdeV2Factory`<br>Encodes 
nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This 
Serde uses Jackson's default (camelCase) property naming convention. This serde 
should be pr
 eferred over JsonSerde, especially in High Level API, unless the dasherized 
naming convention is required (e.g., for backwards 
compatibility).<br><br>`org.apache.samza.serializers.LongSerdeFactory`<br>Encodes
 `java.lang.Long` as binary (8 bytes fixed-length big-endian encoding). Its 
predefined serde-name is 
`long`.<br><br>`org.apache.samza.serializers.DoubleSerdeFactory`<br>Encodes 
`java.lang.Double` as binary (8 bytes double-precision float point). Its 
predefined serde-name is 
`double`.<br><br>`org.apache.samza.serializers.UUIDSerdeFactory`<br>Encodes 
`java.util.UUID` 
objects.<br><br>`org.apache.samza.serializers.SerializableSerdeFactory`<br>Encodes
 `java.io.Serializable` objects. Its predefined serde-name is 
`serializable`.<br><br>`org.apache.samza.serializers.MetricsSnapshotSerdeFactory`<br>Encodes
 `org.apache.samza.metrics.reporter.MetricsSnapshot` objects (which are used 
for reporting metrics) as 
JSON.<br><br>`org.apache.samza.serializers.KafkaSerdeFactory`<br>Adapter which 
all
 ows existing `kafka.serializer.Encoder` and `kafka.serializer.Decoder` 
implementations to be used as Samza serdes. Set 
`serializers.registry.serde-name.encoder` and  
`serializers.registry.serde-name.decoder` to the appropriate class names.|
+
+#### <a name="advanced-application-configurations"></a> [1.1 Advanced 
Application Configurations](#advanced-application-configurations)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|job.changelog.system|inherited from job.default.system|This property is 
required if you would like to override the system defined in 
`job.default.system` for the changelog. The changelog will be used with the 
stream specified in `stores.store-name.changelog` config. You can override this 
system by specifying both the system and the stream in 
`stores.store-name.changelog`.|
+|job.coordinator.system|inherited from job.default.system|This property is 
required if you would like to override the system defined in 
`job.default.system` for coordination. The **_system-name_** to use for 
creating and maintaining the Coordinator Stream.|
+|job.config.rewriter.<br>**_rewriter-name_**.class|(none)|You can optionally 
define configuration rewriters, which have the opportunity to dynamically 
modify the job configuration before the job is started. For example, this can 
be useful for pulling configuration from an external configuration management 
system, or for determining the set of input streams dynamically at runtime. The 
value of this property is a fully-qualified Java classname which must implement 
[ConfigRewriter](../api/javadocs/org/apache/samza/config/ConfigRewriter.html). 
Samza ships with these rewriters by 
default:<br><br>`org.apache.samza.config.RegExTopicGenerator`<br>When consuming 
from Kafka, this allows you to consume all Kafka topics that match some regular 
expression (rather than having to list each topic explicitly). This rewriter 
has additional 
configuration.<br><br>`org.apache.samza.config.EnvironmentConfigRewriter`<br>This
 rewriter takes environment variables that are prefixed with `SAMZA_` and adds 
the
 m to the configuration, overriding previous values where they exist. The keys 
are lowercased and underscores are converted to dots.|
+|job.config.rewriters|(none)|If you have defined configuration rewriters, you 
need to list them here, in the order in which they should be applied. The value 
of this property is a comma-separated list of **_rewriter-name_** tokens.|
+|job.config.rewriter.<br>**_rewriter-name_**.system|(none)|Set this property 
to the `system-name` of the Kafka system from which you want to consume all 
matching topics.|
+|job.config.rewriter.<br>**_rewriter-name_**.regex|(none)|A regular expression 
specifying which topics you want to consume within the Kafka system 
`job.config.rewriter.*.system`. Any topics matched by this regular expression 
will be consumed in addition to any topics you specify in your application.|
+|job.config.rewriter.<br>**_rewriter-name_**.config.*| |Any properties 
specified within this namespace are applied to the configuration of streams 
that match the regex in `job.config.rewriter.*.regex`. For example, you can set 
`job.config.rewriter.*.config.samza.msg.serde` to configure the deserializer 
for messages in the matching streams, which is equivalent to setting 
`systems.*.streams.*.samza.msg.serde` for each topic that matches the regex.|
+|job.container.thread.<br>pool.size|0|If configured, the container thread pool 
will be used to run synchronous operations of each task [in 
parallel](#../container/event-loop.html). The operations include 
StreamTask.process(), WindowableTask.window(), and internally Task.commit(). If 
not configured and the default value of 0 is used, all task operations will run 
in a single thread.|
+|job.coordinator.<br>monitor-partition-change.<br>frequency.ms|300000|The 
frequency at which the input streams' partition count change should be 
detected. When the input partition count change is detected, Samza will 
automatically restart a stateless job or fail a stateful job. A longer time 
interval is recommended for jobs w/ large number of input system stream 
partitions, since gathering partition count may incur measurable overhead to 
the job. You can completely disable partition count monitoring by setting this 
value to 0 or a negative integer, which will also disable auto-restart/failing 
behavior of a Samza job on partition count changes.|
+|job.coordinator.segment.<br>bytes|26214400|   If you are using a Kafka system 
for coordinator stream, this is the segment size to be used for the coordinator 
topic's log segments. Keeping this number small is useful because it increases 
the frequency that Kafka will garbage collect old messages.|
+|job.coordinator.replication.<br>factor|300000|The frequency at which the 
input streams' partition count change should be detected. When the input 
partition count change is detected, Samza will automatically restart a 
stateless job or fail a stateful job. A longer time interval is recommended for 
jobs w/ large number of input system stream partitions, since gathering 
partition count may incur measurable overhead to the job. You can completely 
disable partition count monitoring by setting this value to 0 or a negative 
integer, which will also disable auto-restart/failing behavior of a Samza job 
on partition count changes.|
+|job.systemstreampartition.<br>grouper.factory|`org.apache.samza.`<br>`container.grouper.stream.`<br>`GroupByPartitionFactory`|A
 factory class that is used to determine how input SystemStreamPartitions are 
grouped together for processing in individual StreamTask instances. The factory 
must implement the SystemStreamPartitionGrouperFactory interface. Once this 
configuration is set, it can't be changed, since doing so could violate state 
semantics, and lead to a loss of 
data.<br><br>`org.apache.samza.container.grouper.stream.`<br>`GroupByPartitionFactory`<br>Groups
 input stream partitions according to their partition number. This grouping 
leads to a single StreamTask processing all messages for a single partition 
(e.g. partition 0) across all input streams that have a partition 0. Therefore, 
the default is that you get one StreamTask for all input partitions with the 
same partition number. Using this strategy, if two input streams have a 
partition 0, then messages from both partitions
  will be routed to a single StreamTask. This partitioning strategy is useful 
for joining and aggregating 
streams.<br><br>`org.apache.samza.container.grouper.stream.`<br>`GroupBySystemStreamPartitionFactory`<br>Assigns
 each SystemStreamPartition to its own unique StreamTask. The 
GroupBySystemStreamPartitionFactory is useful in cases where you want increased 
parallelism (more containers), and don't care about co-locating partitions for 
grouping or joins, since it allows for a greater number of StreamTasks to be 
divided up amongst Samza containers.|
+|job.systemstreampartition.<br>matcher.class| |If you want to enable static 
partition assignment, then this is a required configuration. The value of this 
property is a fully-qualified Java class name that implements the interface 
org.apache.samza.system.SystemStreamPartitionMatcher. Samza ships with two 
matcher 
classes:<br><br>`org.apache.samza.system.RangeSystemStreamPartitionMatcher`<br>This
 classes uses a comma separated list of range(s) to determine which partition 
matches, and thus statically assigned to the Job. For example "2,3,1-2", 
statically assigns partition 1, 2, and 3 for all the specified system and 
streams (topics in case of Kafka) to the job. For config validation each 
element in the comma separated list much conform to one of the following 
regex:<br>`(\\d+)`" or"`(\\d+-\\d+)`"<br>`JobConfig.SSP_MATCHER_CLASS_RANGE` 
constant has the canonical name of this 
class.<br><br>`org.apache.samza.system.RegexSystemStreamPartitionMatcher`<br>This
 classes uses a standard Java s
 upported regex to determine which partition matches, and thus statically 
assigned to the Job. For example "[1-2]", statically assigns partition 1 and 2 
for all the specified system and streams (topics in case of Kafka) to the job. 
JobConfig.SSP_MATCHER_CLASS_REGEX constant has the canonical name of this 
class.|
+|job.systemstreampartition.<br>matcher.config.<br>range| |If 
`job.systemstreampartition.matcher.class` is specified, and the value of this 
property is `org.apache.samza.system.RangeSystemStreamPartitionMatcher`, then 
this property is a required configuration. Specify a comma separated list of 
range(s) to determine which partition matches, and thus statically assigned to 
the Job. For example "2,3,11-20", statically assigns partition 2, 3, and 11 to 
20 for all the specified system and streams (topics in case of Kafka) to the 
job. A singel configuration value like "19" is valid as well. This statically 
assigns partition 19. For config validation each element in the comma separated 
list much conform to one of the following regex:<br>"`(\\d+)`" or 
"`(\\d+-\\d+)`"|
+|job.systemstreampartition.<br>matcher.config.<br>regex| |If 
`job.systemstreampartition.matcher.class` is specified, and the value of this 
property is `org.apache.samza.system.RegexSystemStreamPartitionMatcher`, then 
this property is a required configuration. The value should be a valid Java 
supported regex. For example "[1-2]", statically assigns partition 1 and 2 for 
all the specified system and streams (topics in case of Kakfa) to the job.|
+|job.systemstreampartition.<br>matcher.config.<br>job.factory.regex| |This 
configuration can be used to specify the Java supported regex to match the 
StreamJobFactory for which the static partition assignment should be enabled. 
This configuration enables the partition assignment feature to be used for 
custom StreamJobFactory(ies) as well.<br>This config defaults to the following 
value: "_org\\\\.apache\\\\.samza\\\\.job\\\\.local(.*ProcessJobFactory &#124; 
.*ThreadJobFactory)_", which enables static partition assignment when 
job.factory.class is set to `org.apache.samza.job.local.ProcessJobFactory` or 
`org.apache.samza.job.local.ThreadJobFactory`.|
+|job.security.manager.<br>factory|(none)|This is the factory class used to 
create the proper SecurityManager to handle security for Samza containers when 
running in a secure environment, such as Yarn with Kerberos eanbled. Samza 
ships with one security manager by 
default:<br><br>`org.apache.samza.job.yarn.SamzaYarnSecurityManagerFactory`<br>Supports
 Samza containers to run properly in a Kerberos enabled Yarn cluster. Each 
Samza container, once started, will create a SamzaContainerSecurityManager. 
SamzaContainerSecurityManager runs on its separate thread and update user's 
delegation tokens at the interval specified by 
yarn.token.renewal.interval.seconds. See Yarn Security for details.|
+|task.callback.timeout.ms|-1(no timeout)|For an AsyncStreamTask, this defines 
the max allowed time for a processAsync callback to complete. For a StreamTask, 
this is the max allowed time for a process call to complete. When the timeout 
happens,the container is shutdown. Default is no timeout.|
+|task.chooser.class|`org.apache.samza.`<br>`system.chooser.`<br>`RoundRobinChooserFactory`|This
 property can be optionally set to override the default [message 
chooser](../container/streams.html#messagechooser), which determines the order 
in which messages from multiple input streams are processed. The value of this 
property is the fully-qualified name of a Java class that implements 
[MessageChooserFactory](../api/javadocs/org/apache/samza/system/chooser/MessageChooserFactory.html).|
+|task.command.class|`org.apache.samza.job.`<br>`ShellCommandBuilder`|The 
fully-qualified name of the Java class which determines the command line and 
environment variables for a [container](../container/samza-container.html). It 
must be a subclass of 
[CommandBuilder](../api/javadocs/org/apache/samza/job/CommandBuilder.html). 
This defaults to task.command.class=`org.apache.samza.job.ShellCommandBuilder`.|
+|task.drop.deserialization.errors|false|This property is to define how the 
system deals with deserialization failure situation. If set to true, the system 
will skip the error messages and keep running. If set to false, the system with 
throw exceptions and fail the container. |
+|task.drop.serialization.errors|false|This property is to define how the 
system deals with serialization failure situation. If set to true, the system 
will drop the error messages and keep running. If set to false, the system with 
throw exceptions and fail the container. |
+|task.drop.producer.errors|false|If true, producer errors will be logged and 
ignored. The only exceptions that will be thrown are those which are likely 
caused by the application itself (e.g. serializaiton errors). If false, the 
producer will be closed and producer errors will be propagated upward until the 
container ultimately fails. Failing the container is a safety precaution to 
ensure the latest checkpoints only reflect the events that have been completely 
and successfully processed. However, some applications prefer to remain running 
at all costs, even if that means lost messages. Setting this property to true 
will enable applications to recover from producer errors at the expense of one 
or many (in the case of batching producers) dropped messages. If you enable 
this, it is highly recommended that you also configure alerting on the 
'producer-send-failed' metric, since the producer might drop messages 
indefinitely. The logic for this property is specific to each SystemProducer i
 mplementation. It will have no effect for SystemProducers that ignore the 
property.|
+|task.ignored.exceptions| |This property specifies which exceptions should be 
ignored if thrown in a task's process or window methods. The exceptions to be 
ignored should be a comma-separated list of fully-qualified class names of the 
exceptions or * to ignore all exceptions.|
+|task.log4j.location.info.enabled|false|Defines whether or not to include 
log4j's LocationInfo data in Log4j StreamAppender messages. LocationInfo 
includes information such as the file, class, and line that wrote a log 
message. This setting is only active if the Log4j stream appender is being 
used. (See [Stream Log4j Appender](../logging.html#stream-log4j-appender))|
+|task.max.idle.ms|10|The maximum time to wait for a task worker to complete 
when there are no new messages to handle before resuming the main loop and 
potentially polling for more messages. `See task.poll.interval.ms` This timeout 
value prevents the main loop from spinning when there is nothing for it to do. 
Increasing this value will reduce the background load of the thread, but, also 
potentially increase message latency. It should not be set greater than the 
`task.poll.interval.ms`.|
+|task.max.concurrency|1|Max number of outstanding messages being processed per 
task at a time, and itâs applicable to both StreamTask and AsyncStreamTask. 
The values can be:<br><br>`1`<br>Each task processes one message at a time. 
Next message will wait until the current message process completes. This 
ensures strict in-order processing.<br><br>`>1`<br>Multiple outstanding 
messages are allowed to be processed per task at a time. The completion can be 
out of order. This option increases the parallelism within a task, but may 
result in out-of-order processing.|
+|task.name.grouper.factory|`org.apache.samza.`<br>`container.grouper.task.`<br>`GroupByContainerCountFactory`|The
 fully-qualified name of the Java class which determines the factory class 
which will build the TaskNameGrouper. The default configuration value if the 
property is not present is 
task.name.grouper.factory=`org.apache.samza.container.grouper.task.`<br>`GroupByContainerCountFactory`.The
 user can specify a custom implementation of the TaskNameGrouperFactory where a 
custom logic is implemented for grouping the tasks.<br>Note: For non-cluster 
applications (ones using coordination service) one must use 
`org.apache.samza.container.grouper.`<br>`task.GroupByContainerIdsFactory`|
+|task.opts| |Any JVM options to include in the command line when executing 
Samza containers. For example, this can be used to set the JVM heap size, to 
tune the garbage collector, or to enable remote debugging. This cannot be used 
when running with ThreadJobFactory. Anything you put in task.opts gets 
forwarded directly to the commandline as part of the JVM 
invocation.<br>Example: `task.opts=-XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseConcMarkSweepGC`|
+|task.poll.interval.ms|50|Samza's container polls for more messages under two 
conditions. The first condition arises when there are simply no remaining 
buffered messages to process for any input SystemStreamPartition. The second 
condition arises when some input SystemStreamPartitions have empty buffers, but 
some do not. In the latter case, a polling interval is defined to determine how 
often to refresh the empty SystemStreamPartition buffers. By default, this 
interval is 50ms, which means that any empty SystemStreamPartition buffer will 
be refreshed at least every 50ms. A higher value here means that empty 
SystemStreamPartitions will be refreshed less often, which means more latency 
is introduced, but less CPU and network will be used. Decreasing this value 
means that empty SystemStreamPartitions are refreshed more frequently, thereby 
introducing less latency, but increasing CPU and network utilization.|
+|task.shutdown.ms|30000|This property controls how long the Samza container 
will wait for an orderly shutdown of task instances.|
+|job.container.single.<br>thread.mode|false|_(Deprecated)_ If set to true, 
samza will fallback to legacy single-threaded event loop. Default is false, 
which enables the [multithreading execution](../container/event-loop.html).|
+
+### <a name="checkpointing"></a> [2. Checkpointing](#checkpointing)
+[Checkpointing](../container/checkpointing.html) is not required, but 
recommended for most jobs. If you don't configure checkpointing, and a job or 
container restarts, it does not remember which messages it has already 
processed. Without checkpointing, consumer behavior on startup is determined by 
the ...samza.offset.default setting. Checkpointing allows a job to start up 
where it previously left off.
+
+|Name|Default|Description|
+|--- |--- |--- |
+|task.checkpoint.factory| |To enable 
[checkpointing](../container/checkpointing.html), you must set this property to 
the fully-qualified name of a Java class that implements 
[CheckpointManagerFactory](../api/javadocs/org/apache/samza/checkpoint/CheckpointManagerFactory.html).
 Samza ships with two checkpoint managers by default: 
<br><br>`org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory` 
<br>Writes checkpoints to a dedicated topic on a Kafka cluster. This is the 
recommended option if you are already using Kafka for input or output streams. 
Use the task.checkpoint.system property to configure which Kafka cluster to use 
for 
checkpoints.<br><br>`org.apache.samza.checkpoint.file.FileSystemCheckpointManagerFactory`
 <br>__For dev deployments only.__ Writes checkpoints to files on the local 
filesystem. You can configure the file path with the task.checkpoint.path 
property. This is a simple option if your job always runs on the same machine. 
On a multi-machine cluster, this wou
 ld require a network filesystem mount. |
+|task.commit.ms|60000|If task.checkpoint.factory is configured, this property 
determines how often a checkpoint is written. The value is the time between 
checkpoints, in milliseconds. The frequency of checkpointing affects failure 
recovery: if a container fails unexpectedly (e.g. due to crash or machine 
failure) and is restarted, it resumes processing at the last checkpoint. Any 
messages processed since the last checkpoint on the failed container are 
processed again. Checkpointing more frequently reduces the number of messages 
that may be processed twice, but also uses more resources.|
+
+##### <a name="advanced-checkpointing-configuration"></a>[2.1 Advanced 
Checkpointing Configurations](#advanced-checkpointing-configuration)
+|Name|Default|Description|
+|--- |--- |--- |
+|task.checkpoint.system|inherited from job.default.system|This property is 
required if you would like to override the system defined in 
`job.default.system` for checkpointing. You must set it to the 
_**system-name**_ of the desired checkpointing system. The stream name (topic 
name) within that system is automatically determined from the job name and ID: 
__samza_checkpoint_${job.name}_${job.id} (with underscores in the job name and 
ID replaced by hyphens).|
+|job.checkpoint.validation.enabled|true|This setting controls if the job 
should fail(true) or just warn(false) in case of the checkpoint topic 
fails.<br>__CAUTION:__ this configuration needs to be used w/ care. It should 
only be used as a work-around if the checkpoint topic was created with the 
wrong number of partitions, it's contents have been corrupted, or the 
`SystemStreamPartitionGrouperFactory` for the job needs to be changed.|
+|task.checkpoint.path| |Required if you are using the filesystem for 
checkpoints. Set this to the path on your local filesystem where checkpoint 
files should be stored.|
+|task.checkpoint.<br>replication.factor|2|If you are using Kafka for 
checkpoints, this is the number of Kafka nodes to which you want the checkpoint 
topic replicated for durability.|
+|task.checkpoint.<br>segment.bytes|26214400|If you are using Kafka for 
checkpoints, this is the segment size to be used for the checkpoint topic's log 
segments. Keeping this number small is useful because it increases the 
frequency that Kafka will garbage collect old checkpoints.|
+
+### <a name="systems-streams"></a>[3. Systems & Streams](#systems-streams)
+Samza consumes from and produces to [Streams](../container/streams.html) and 
has support for a variety of Systems including Kafka, HDFS, Azure EventHubs, 
Kinesis and ElasticSearch.
+
+|Name|Default|Description|
+|--- |--- |--- |
+|task.inputs| |This configuration is only required for legacy task 
applications. A comma-separated list of streams that are consumed by this job. 
Each stream is given in the format system-name.stream-name. For example, if you 
have one input system called my-kafka, and want to consume two Kafka topics 
called PageViewEvent and UserActivityEvent, then you would set 
task.inputs=my-kafka.PageViewEvent, my-kafka.UserActivityEvent.|
+|task.broadcast.inputs| |This property specifies the partitions that all tasks 
should consume. The systemStreamPartitions you put here will be sent to all the 
tasks. <br>Format: system-name.stream-name#partitionId or 
system-name.stream-name#[startingPartitionId-endingPartitionId] <br>Example: 
task.broadcast.inputs=mySystem.broadcastStream#[0-2], 
mySystem.broadcastStream#0|
+|systems.**_system-name_**.samza.factory| |The fully-qualified name of a Java 
class which provides a system. A system can provide input streams which you can 
consume in your Samza job, or output streams to which you can write, or both. 
The requirements on a system are very flexible â it may connect to a message 
broker, or read and write files, or use a database, or anything else. The class 
must implement 
[SystemFactory](../api/javadocs/org/apache/samza/system/SystemFactory.html). 
Alternatively, the user may define the system factory in code using 
SystemDescriptors. Samza ships with the following implementations: 
<br><br>`org.apache.samza.system.kafka.KafkaSystemFactory` 
[(Configs)](#kafka)<br>`org.apache.samza.system.hdfs.HdfsSystemFactory` 
[(Configs)](#hdfs) <br>`org.apache.samza.system.eventhub.EventHubSystemFactory` 
[(Configs)](#eventhubs)<br>`org.apache.samza.system.kinesis.KinesisSystemFactory`
 
[(Configs)](#kinesis)<br>`org.apache.samza.system..elasticsearch.ElasticsearchSyst
 emFactory` [(Configs)](#elasticsearch)
+|systems.**_system-name_**.default.stream.*| |A set of default properties for 
any stream associated with the system. For example, if 
"systems.kafka-system.default.stream.replication.factor"=2 was configured, then 
every Kafka stream created on the kafka-system will have a replication factor 
of 2 unless the property is explicitly overridden at the stream scope using 
streams properties.|
+|systems.**_system-name_**.default.stream.samza.key.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
key of messages on input streams, and to serialize the key of messages on 
output streams. This property defines the serde for an for all streams in the 
system. See the stream-scoped property to define the serde for an individual 
stream. If both are defined, the stream-level definition takes precedence. The 
value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
+|systems.**_system-name_**.default.stream.samza.msg.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
value of messages on input streams, and to serialize the value of messages on 
output streams. This property defines the serde for an for all streams in the 
system. See the stream-scoped property to define the serde for an individual 
stream. If both are defined, the stream-level definition takes precedence. The 
value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
+|systems.**_system-name_**.default.stream.samza.offset.default|`upcoming`|If a 
container starts up without a [checkpoint](../container/checkpointing.html),  
this property determines where in the input stream we should start consuming. 
The value must be an 
[OffsetType](../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html),
 one of the following: <br><br>`upcoming` <br>Start processing messages that 
are published after the job starts. Any messages published while the job was 
not running are not processed. <br><br>`oldest` <br>Start processing at the 
oldest available message in the system, and [reprocess](reprocessing.html) the 
entire available message history. <br><br>This property is for all streams 
within a system. To set it for an individual stream, see 
streams.stream-id.samza.offset.default. If both are defined, the stream-level 
definition takes precedence.|
+|streams.**_stream-id_**.samza.system| |The system-name of the system on which 
this stream will be accessed. This property binds the stream to one of the 
systems defined with the property systems.system-name.samza.factory. If this 
property isn't specified, it is inherited from job.default.system.|
+|streams.**_stream-id_**.samza.physical.name| |The physical name of the stream 
on the system on which this stream will be accessed. This is opposed to the 
stream-id which is the logical name that Samza uses to identify the stream. A 
physical name could be a Kafka topic name, an HDFS file URN or any other 
system-specific identifier.|
+|streams.**_stream-id_**.samza.key.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
key of messages on input streams, and to serialize the key of messages on 
output streams. This property defines the serde for an individual stream. See 
the system-scoped property to define the serde for all streams within a system. 
If both are defined, the stream-level definition takes precedence. The value of 
this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
+|streams.**_stream-id_**.samza.msg.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
value of messages on input streams, and to serialize the value of messages on 
output streams. This property defines the serde for an individual stream. See 
the system-scoped property to define the serde for all streams within a system. 
If both are defined, the stream-level definition takes precedence. The value of 
this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
+|streams.**_stream-id_**.samza.offset.default|`upcoming`|If a container starts 
up without a [checkpoint](../container/checkpointing.html), this property 
determines where in the input stream we should start consuming. The value must 
be an [OffsetType 
(../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html), 
one of the following: <br><br>`upcoming` <br>Start processing messages that are 
published after the job starts. Any messages published while the job was not 
running are not processed. <br><br>`oldest` <br>Start processing at the oldest 
available message in the system, and [reprocess](reprocessing.html) the entire 
available message history. <br><br>This property is for an individual stream. 
To set it for all streams within a system, see  
systems.system-name.samza.offset.default. If both are defined, the stream-level 
definition takes precedence.|
+
+##### <a name="advanced-system-stream-configurations"></a>[3.1 Advanced System 
& Stream Configuration](#advanced-system-stream-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|streams.**_stream-id_**.*| |Any properties of the stream. These are typically 
system-specific and can be used by the system for stream creation or 
validation. Note that the other properties are prefixed with `samza`. which 
distinguishes them as Samza properties that are not system-specific.|
+|streams.**_stream-id_**.<br>samza.delete.committed.messages|false|If set to 
true, committed messages of this stream can be deleted. Committed messages of 
this stream will be deleted if 
`systems.system-name.samza.delete.committed.messages` is also set to true.|
+|streams.**_stream-id_**.<br>samza.reset.offset|false|If set to true, when a 
Samza container starts up, it ignores any [checkpointed 
offset](../container/checkpointing.html) for this particular input stream. Its 
behavior is thus determined by the `samza.offset.default` setting. Note that 
the reset takes effect every time a container is started, which may be every 
time you restart your job, or more frequently if a container fails and is 
restarted by the framework.|
+|streams.**_stream-id_**.<br>samza.priority|-1|If one or more streams have a 
priority set (any positive integer), they will be processed with [higher 
priority](../container/streams.html#prioritizing-input-streams) than the other 
streams. You can set several streams to the same priority, or define multiple 
priority levels by assigning a higher number to the higher-priority streams. If 
a higher-priority stream has any messages available, they will always be 
processed first; messages from lower-priority streams are only processed when 
there are no new messages on higher-priority inputs.|
+|streams.**_stream-id_**.<br>samza.bootstrap|false|If set to true, this stream 
will be processed as a [bootstrap 
stream](../container/streams.html#bootstrapping). This means that every time a 
Samza container starts up, this stream will be fully consumed before messages 
from any other stream are processed.|
+|task.consumer.batch.size|1|If set to a positive integer, the task will try to 
consume batches with the given number of messages from each input stream, 
rather than consuming round-robin from all the input streams on each individual 
message. Setting this property can improve performance in some cases.|
+
+#### <a name="kafka"></a>[3.2 Kafka](#kafka)
+Configs for consuming and producing to [Apache 
Kafka](https://kafka.apache.org/). This section applies if you have set 
systems.*.samza.factory = `org.apache.samza.system.kafka.KafkaSystemFactory`
+Samples found [here](../../../../startup/hello-samza/versioned)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.consumer.zookeeper.connect| |The hostname and port 
of one or more Zookeeper nodes where information about the Kafka cluster can be 
found. This is given as a comma-separated list of hostname:port pairs, such as 
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181. If the cluster 
information is at some sub-path of the Zookeeper namespace, you need to include 
the path at the end of the list of hostnames, for example: 
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/clusters/my-kafka|
+|systems.**_system-name_**.producer.bootstrap.servers| | A list of network 
endpoints where the Kafka brokers are running. This is given as a 
comma-separated list of hostname:port pairs, for example 
kafka1.example.com:9092,kafka2.example.com:9092,kafka3.example.com:9092. It's 
not necessary to list every single Kafka node in the cluster: Samza uses this 
property in order to discover which topics and partitions are hosted on which 
broker. This property is needed even if you are only consuming from Kafka, and 
not writing to it, because Samza uses it to discover metadata about streams 
being consumed.|
+|systems.**_system-name_**.consumer.auto.offset.reset|`largest`|This setting 
determines what happens if a consumer attempts to read an offset that is 
outside of the current valid range. This could happen if the topic does not 
exist, or if a checkpoint is older than the maximum message history retained by 
the brokers. This property is not to be confused with 
systems.*.samza.offset.default, which determines what happens if there is no 
checkpoint. The following are valid values for auto.offset.reset: 
<br><br>`smallest` <br>Start consuming at the smallest (oldest) offset 
available on the broker (process as much message history as available). 
<br><br>`largest` <br>Start consuming at the largest (newest) offset available 
on the broker (skip any messages published while the job was not running). 
<br><br>anything else <br>Throw an exception and refuse to start up the job.|
+
+##### <a name="advanced-kafka-configurations"></a>[Advanced Kafka 
Configurations](#advanced-kafka-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.<br>consumer.*| |Any [Kafka consumer 
configuration](http://kafka.apache.org/documentation.html#newconsumerconfigs) 
can be included here. For example, to change the socket timeout, you can set 
`systems.system-name.consumer.socket.timeout.ms`. (There is no need to 
configure `group.id` or `client.id`, as they are automatically configured by 
Samza. Also, there is no need to set `auto.commit.enable` because Samza has its 
own checkpointing mechanism.)|
+|systems.**_system-name_**.<br>producer.*| |Any [Kafka producer 
configuration](http://kafka.apache.org/documentation.html#producerconfigs) can 
be included here. For example, to change the request timeout, you can set 
`systems.system-name.producer.timeout.ms`. (There is no need to configure 
`client.id` as it is automatically configured by Samza.)|
+|systems.**_system-name_**.<br>samza.fetch.threshold|5000|When consuming 
streams from Kafka, a Samza container maintains an in-memory buffer for 
incoming messages in order to increase throughput (the stream task can continue 
processing buffered messages while new messages are fetched from Kafka). This 
parameter determines the number of messages we aim to buffer across all stream 
partitions consumed by a container. For example, if a container consumes 50 
partitions, it will try to buffer 1000 messages per partition by default. When 
the number of buffered messages falls below that threshold, Samza fetches more 
messages from the Kafka broker to replenish the buffer. Increasing this 
parameter can increase a job's processing throughput, but also increases the 
amount of memory used.|
+|systems.**_system-name_**.<br>samza.fetch.threshold.bytes|-1|When consuming 
streams from Kafka, a Samza container maintains an in-memory buffer for 
incoming messages in order to increase throughput (the stream task can continue 
processing buffered messages while new messages are fetched from Kafka). This 
parameter determines the total size of messages we aim to buffer across all 
stream partitions consumed by a container based on bytes. Defines how many 
bytes to use for the buffered prefetch messages for job as a whole. The bytes 
for a single system/stream/partition are computed based on this. This fetches 
the entire messages, hence this bytes limit is a soft one, and the actual usage 
can be the bytes limit + size of max message in the partition for a given 
stream. If the value of this property is > 0 then this takes precedence over 
systems.system-name.samza.fetch.threshold. For example, if fetchThresholdBytes 
is set to 100000 bytes, and there are 50 SystemStreamPartitions registere
 d, then the per-partition threshold is (100000 / 2) / 50 = 1000 bytes. As this 
is a soft limit, the actual usage can be 1000 bytes + size of max message. As 
soon as a SystemStreamPartition's buffered messages bytes drops below 1000, a 
fetch request will be executed to get more data for it. Increasing this 
parameter will decrease the latency between when a queue is drained of messages 
and when new messages are enqueued, but also leads to an increase in memory 
usage since more messages will be held in memory. The default value is -1, 
which means this is not used.|
+
+#### <a name="hdfs"></a>[3.3 HDFS](#hdfs)
+Configs for [consuming](../hadoop/consumer.html) and 
[producing](../hadoop/producer.html) to 
[HDFS](https://hortonworks.com/apache/hdfs/). This section applies if you have 
set systems.*.samza.factory = `org.apache.samza.system.hdfs.HdfsSystemFactory`
+More about batch processing [here](../hadoop/overview.html).
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.producer.hdfs.base.output.dir|/user/USERNAME/SYSTEMNAME|The
 base output directory for HDFS writes. Defaults to the home directory of the 
user who ran the job, followed by the systemName for this HdfsSystemProducer as 
defined in the job.properties file.|
+|systems.**_system-name_**.producer.hdfs.writer.class|`org.apache.samza.system.hdfs.writer.`<br>`BinarySequenceFileHdfsWriter`|Fully-qualified
 class name of the HdfsWriter implementation this HDFS Producer system should 
use|
+|systems.**_system-name_**.stagingDirectory|Inherit from 
`yarn.job.staging.directory` if set|Staging directory for storing partition 
description. By default (if not set by users) the value is inherited from 
"yarn.job.staging.directory" internally. The default value is typically good 
enough unless you want explicitly use a separate location.|
+
+##### <a name="advanced-hdfs-configurations"></a>[Advanced HDFS 
Configurations](#advanced-hdfs-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.<br>.consumer.bufferCapacity|10|Capacity of the 
hdfs consumer buffer - the blocking queue used for storing messages. Larger 
buffer capacity typically leads to better throughput but consumes more memory.|
+|systems.**_system-name_**.<br>.consumer.numMaxRetries|10|The number of retry 
attempts when there is a failure to fetch messages from HDFS, before the 
container fails.|
+|systems.**_system-name_**.<br>.partitioner.defaultPartitioner.whitelist|.*|White
 list used by directory partitioner to select files in a hdfs directory, in 
Java Pattern style.|
+|systems.**_system-name_**.<br>.partitioner.defaultPartitioner.blacklist|(none)|Black
 list used by directory partitioner to filter out unwanted files in a hdfs 
directory, in Java Pattern style.|
+|systems.**_system-name_**.<br>.partitioner.defaultPartitioner.groupPattern| 
|Group pattern used by directory partitioner for advanced partitioning. The 
advanced partitioning goes beyond the basic assumption that each file is a 
partition. With advanced partitioning you can group files into partitions 
arbitrarily. For example, if you have a set of files as [part-01-a.avro, 
part-01-b.avro, part-02-a.avro, part-02-b.avro, part-03-a.avro], and you want 
to organize the partitions as (part-01-a.avro, part-01-b.avro), 
(part-02-a.avro, part-02-b.avro), (part-03-a.avro), where the numbers in the 
middle act as a "group identifier", you can then set this property to be 
"part-[id]-.*" (note that "[id]" is a reserved term here, i.e. you have to 
literally put it as "[id]"). The partitioner will apply this pattern to all 
file names and extract the "group identifier" ("[id]" in the pattern), then use 
the "group identifier" to group files into partitions. See more details in 
[HdfsSystemConsumer desi
 gn 
doc](https://issues.apache.org/jira/secure/attachment/12827670/HDFSSystemConsumer.pdf)|
+|systems.**_system-name_**.<br>.consumer.reader|`avro`|Type of the file reader 
for different event formats (avro, plain, json, etc.). "avro" is only type 
supported for now.|
+|systems.**_system-name_**.<br>.producer.hdfs.compression.type|(none)|A 
human-readable label for the compression type to use, such as "gzip" "snappy" 
etc. This label will be interpreted differently (or ignored) depending on the 
nature of the HdfsWriter implementation.|
+|systems.**_system-name_**.<br>.producer.hdfs.bucketer.class|`org.apache.samza.system.hdfs.`<br>`writer.JobNameDateTimeBucketer`|Fully-qualified
 class name of the Bucketer implementation that will manage HDFS paths and file 
names. Used to batch writes by time, or other similar partitioning methods.|
+|systems.**_system-name_**.<br>.producer.hdfs.bucketer.date.path.format|yyyy_MM_dd-HH|Fully-qualified
 class name of the Bucketer implementation that will manage HDFS paths and file 
names. Used to batch writes by time, or other similar partitioning methods.|
+|systems.**_system-name_**.<br>.producer.hdfs.write.batch.size.bytes|268435456|The
 number of bytes of outgoing messages to write to each HDFS output file before 
cutting a new file. Defaults to 256MB if not set.|
+|systems.**_system-name_**.<br>.producer.hdfs.write.batch.size.records|262144|The
 number of outgoing messages to write to each HDFS output file before cutting a 
new file. Defaults to 262144 if not set.|
+
+#### <a name="eventhubs"></a>[3.4 EventHubs](#eventhubs)
+Configs for consuming and producing to [Azure 
EventHubs](https://azure.microsoft.com/en-us/services/event-hubs/). This 
section applies if you have set systems.*.samza.factory = 
`org.apache.samza.system.eventhub.EventHubSystemFactory`
+Documentation and samples found [here](../azure/eventhubs.html)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.stream.list| |List of Samza **_stream-id_** used 
for the Eventhub system|
+|streams.**_stream-id_**.eventhubs.namespace| |Namespace of the associated 
stream-ids. __Required__ to access the Eventhubs entity per stream.|
+|streams.**_stream-id_**.eventhubs.entitypath| |Entity of the associated 
stream-ids. __Required__ to access the Eventhubs entity per stream.|
+|sensitive.streams.**_stream-id_**.eventhubs.sas.keyname| |SAS Keyname of the 
associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
+|sensitive.streams.**_stream-id_**.eventhubs.sas.token| |SAS Token the 
associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
+
+##### <a name="advanced-eventhubs-configurations"></a>[Advanced EventHubs 
Configurations](#advanced-eventhubs-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|streams.**_stream-name_**.<br>eventhubs.numClientThreads|10|Number of threads 
in thread pool that will be used by the EventHubClient. See 
[here](https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.eventhubs._event_hub_client.create?view=azure-java-stable)
 for more details.|
+|systems.**_system-name_**.<br>eventhubs.prefetchCount|999|Number of threads 
in thread pool that will be used by the EventHubClient. See 
[here](https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.eventhubs._event_hub_client.create?view=azure-java-stable)
 for more details.|
+|systems.**_system-name_**.<br>eventhubs.maxEventCountPerPoll|50|Maximum 
number of events that EventHub client can return in a receive call. See Maximum 
number of events that EventHub client can return in a receive call. See 
[here](https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.eventhubs._partition_receive_handler.getmaxeventcount?view=azure-java-stable#com_microsoft_azure_eventhubs__partition_receive_handler_getMaxEventCount__)
 for more details. for more details.|
+|systems.**_system-name_**.<br>eventhubs.runtime.info.timeout|60000|Timeout 
for fetching the runtime metadata from an Eventhub entity on startup in millis.|
+|systems.**_system-name_**.<br>eventhubs.partition.method|`EVENT_HUB_HASHING`|Producer
 only config. Configure the method that the message is partitioned for the 
downstream Eventhub in one of the following ways:<br><br>`ROUND_ROBIN` <br>The 
message key and partition key are ignored and the message will be distributed 
in a round-robin fashion amongst all the partitions in the downstream 
EventHub.<br><br>`EVENT_HUB_HASHING` <br>Employs the hashing mechanism in 
EventHubs to determine, based on the key of the message, which partition the 
message should go. Using this method still ensures that all the events with the 
same key are sent to the same partition in the event hub. If this option is 
chosen, the partition key used for the hash should be a string. If the 
partition key is not set, the message key is used 
instead.<br><br>`PARTITION_KEY_AS_PARTITION` <br>Use the integer key specified 
by the partition key or key of the message to a specific partition on Eventhub. 
If the integer key is 
 greater than the number of partitions in the destination Eventhub, a modulo 
operation will be performed to determine the resulting paritition. ie. if there 
are 6 partitions and the key is 9, the message will end up in partition 3. 
Similarly to `EVENT_HUB_HASHING`, if the partition key is not set the message 
key is used instead.|
+|systems.**_system-name_**.<br>eventhubs.send.key|true|Producer only config. 
Sending each message key to the eventhub in the properties of the AMQP message. 
If the Samza Eventhub consumer is used, this field is used as the message key 
if the partition key is not present.|
+|streams.**_stream-id_**.<br>eventhubs.consumer.group|`$Default`|Consumer only 
config. Set the consumer group from the upstream Eventhub that the consumer is 
part of. Defaults to the `$Default` group that is initially present in all 
Eventhub entities (unless removed)|
+|systems.**_system-name_**.<br>eventhubs.receive.queue.size|100|Consumer only 
config. Per partition capacity of the eventhubs consumer buffer - the blocking 
queue used for storing messages. Larger buffer capacity typically leads to 
better throughput but consumes more memory.|
+
+
+#### <a name="kinesis"></a>[3.5 Kinesis](#kinesis)
+Configs for consuming and producing to [Amazon 
Kinesis](https://aws.amazon.com/kinesis/). This section applies if you have set 
systems.*.samza.factory = `org.apache.samza.system.kinesis.KinesisSystemFactory`
+Documentation and samples found [here](../aws/kinesis.html)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.streams.**_stream-name_**.aws.region| |Region of 
the associated stream-name. __Required__ to access the Kinesis data stream.|
+|systems.**_system-name_**.streams.**_stream-name_**.aws.accessKey| |AccessKey 
of the associated stream-name. __Required__ to access Kinesis data stream.|
+|systems.**_system-name_**.streams.**_stream-name_**.aws.secretKey| |SecretKey 
of the associated stream-name. __Required__ to access the Kinesis data stream.|
+
+##### <a name="advanced-kinesis-configurations"></a>[Advanced Kinesis 
Configurations](#advanced-kinesis-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.<br>streams.**_stream-name_**.<br>aws.kcl.*| |[AWS 
Kinesis Client Library 
configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
 associated with the **_stream-name_**.|
+|systems.**_system-name_**.<br>aws.clientConfig.*| |   [AWS 
ClientConfiguration](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html)
 associated with the **_system-name_**.|
+
+#### <a name="elasticsearch"></a>[3.6 ElasticSearch](#elasticsearch)
+Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elasticsearch). This section 
applies if you have set systems.*.samza.factory = 
`org.apache.samza.system..elasticsearch.ElasticsearchSystemFactory`
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.client.factory| |__Required:__ The elasticsearch 
client factory used for connecting to the Elasticsearch cluster. Samza ships 
with the following 
implementations:<br><br>`org.apache.samza.system.elasticsearch.client.TransportClientFactory`<br>Creates
 a TransportClient that connects to the cluster remotely without joining it. 
This requires the transport host and port properties to be 
set.<br><br>`org.apache.samza.system.elasticsearch.client.NodeClientFactory`<br>Creates
 a Node client that connects to the cluster by joining it. By default this uses 
zen discovery to find the cluster but other methods can be configured.|
+|systems.**_system-name_**.index.request.factory|`org.apache.samza.system.`<br>`elasticsearch.indexrequest.`<br>`DefaultIndexRequestFactory`|The
 index request factory that converts the Samza OutgoingMessageEnvelope into the 
IndexRequest to be send to elasticsearch. The default 
IndexRequestFactory(`org.apache.samza.system.elasticsearch.indexrequest`) 
behaves as follows:<br><br>`Stream name`<br>The stream name is of the format 
{index-name}/{type-name} which map on to the elasticsearch index and 
type.<br><br>`Message id`<br>If the message has a key this is set as the 
document id, otherwise Elasticsearch will generate one for each 
document.<br><br>`Partition id`<br>If the partition key is set then this is 
used as the Elasticsearch routing key.<br><br>`Message`<br>The message must be 
either a byte[] which is passed directly on to Elasticsearch, or a Map which is 
passed on to the Elasticsearch client which serialises it into a JSON String. 
Samza serdes are not currently supported.|
+|systems.**_system-name_**.client.transport.host| |__Required__ for 
TransportClientFactory; The hostname that the TransportClientFactory connects 
to.|
+|systems.**_system-name_**.client.transport.port| |__Required__ for 
TransportClientFactory; The port that the TransportClientFactory connects to.|
+
+##### <a name="advanced-elasticsearch-configurations"></a>[Advanced 
ElasticSearch Configurations](#advanced-elasticsearch-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.<br>client.elasticsearch.*| |Any [Elasticsearch 
client 
settings](https://www.elastic.co/guide/en/elasticsearch/client/java-api/1.7/client.html)
 can be used here. They will all be passed to both the transport and node 
clients. Some of the common settings you will want to provide are. 
<br><br>`systems.system-name.client.elasticsearch.cluster.name` <br>The name of 
the Elasticsearch cluster the client is connecting 
to.<br><br>`systems.system-name.client.elasticsearch.client.transport.sniff`<br>If
 set to true then the transport client will discover and keep up to date all 
cluster nodes. This is used for load balancing and fail-over on retries.|
+|systems.**_system-name_**.<br>bulk.flush.max.actions|100|The maximum number 
of messages to be buffered before flushing.|
+|systems.**_system-name_**.<br>bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
+|systems.**_system-name_**.<br>bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
+
+### <a name="state-storage"></a>[4. State Storage](#state-storage)
+These properties define Samza's storage mechanism for efficient [stateful 
stream processing](../container/state-management.html).
+
+|Name|Default|Description|
+|--- |--- |--- |
+|stores.**_store-name_**.factory| |You can give a store any **_store-name_** 
except `default` (`default` is reserved for defining default store parameters), 
and use that name to get a reference to the store in your Samza application 
(call 
[TaskContext.getStore()](../api/javadocs/org/apache/samza/task/TaskContext.html#getStore(java.lang.String))
 in your task's 
[`init()`](../api/javadocs/org/apache/samza/task/InitableTask.html#init) method 
for the low-level API and in your application function's 
[`init()`](..api/javadocs/org/apache/samza/operators/functions/InitableFunction.html#init)
 method for the high level API. The value of this property is the 
fully-qualified name of a Java class that implements 
[`StorageEngineFactory`](../api/javadocs/org/apache/samza/storage/StorageEngineFactory.html).
 Samza currently ships with two storage engine implementations: 
<br><br>`org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory` 
<br>An on-disk storage engine with a key-value interface, 
 implemented using [RocksDB](http://rocksdb.org/). It supports fast 
random-access reads and writes, as well as range queries on keys. RocksDB can 
be configured with various additional tuning parameters.<br><br> 
`org.apache.samza.storage.kv.inmemory.InMemoryKeyValueStorageEngineFactory`<br> 
In memory implementation of a key-value store. Uses 
`util.concurrent.ConcurrentSkipListMap` to store the keys in order.|
+|stores.**_store-name_**.key.serde| |If the storage engine expects keys in the 
store to be simple byte arrays, this [serde](../container/serialization.html) 
allows the stream task to access the store using another object type as key. 
The value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, keys are passed 
unmodified to the storage engine (and the changelog stream, if appropriate).|
+|stores.**_store-name_**.msg.serde| |If the storage engine expects values in 
the store to be simple byte arrays, this 
[serde](../container/serialization.html) allows the stream task to access the 
store using another object type as value. The value of this property must be a 
serde-name that is registered with serializers.registry.*.class. If this 
property is not set, values are passed unmodified to the storage engine (and 
the changelog stream, if appropriate).|
+|stores.**_store-name_**.changelog| |Samza stores are local to a container. If 
the container fails, the contents of the store are lost. To prevent loss of 
data, you need to set this property to configure a changelog stream: Samza then 
ensures that writes to the store are replicated to this stream, and the store 
is restored from this stream after a failure. The value of this property is 
given in the form system-name.stream-name. The "system-name" part is optional. 
If it is omitted you must specify the system in job.changelog.system config. 
Any output stream can be used as changelog, but you must ensure that only one 
job ever writes to a given changelog stream (each instance of a job and each 
store needs its own changelog stream).|
+|stores.**_store-name_**.rocksdb.ttl.ms| |__For RocksDB:__ The time-to-live of 
the store. Please note it's not a strict TTL limit (removed only after 
compaction). Please use caution opening a database with and without TTL, as it 
might corrupt the database. Please make sure to read the 
[constraints](https://github.com/facebook/rocksdb/wiki/Time-to-Live) before 
using.|
+
+##### <a name="advanced-storage-configurations"></a>[4.1 Advanced Storage 
Configurations](#advanced-storage-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|job.logged.store.base.dir|_user.dir_ environment property if set, else 
current working directory of the process|The base directory for changelog 
stores used by Samza application. Another way to configure the base directory 
is by setting environment variable `LOGGED_STORE_BASE_DIR`. __Note:__ The 
environment variable takes precedence over `job.logged.store.base.dir`. <br>By 
opting in, users are responsible for cleaning up the store directories if 
necessary. Jobs using host affinity should ensure that the stores are persisted 
across application/container restarts. This means that the location and cleanup 
of this directory should be separate from the container lifecycle and resource 
cleanup.|
+|job.non-logged.store.base.dir|_user.dir_ environment property if set, else 
current working directory of the process|The base directory for non-changelog 
stores used by Samza application. <br>In YARN, the default behaviour without 
the configuration is to create non-changelog store directories in CWD which 
happens to be the YARN container directory. This gets cleaned up periodically 
as part of NodeManager's deletion service, which is controlled by the YARN 
config `yarn.nodemanager.delete.debug-delay-sec`. <br>In non-YARN deployment 
models or when using a different directory other than YARN container directory, 
stores need to be cleaned up periodically.|
+|stores.default.changelog.<br>replication.factor|2|This property defines the 
default number of replicas to use for the change log stream.|
+|stores.**_store-name_**.changelog.<br>replication.factor|stores.default.changelog.<br>replication.factor|The
 property defines the number of replicas to use for the change log stream.|
+|stores.**_store-name_**.changelog.<br>kafka.topic-level-property| |The 
property allows you to specify topic level settings for the changelog topic to 
be created. For e.g., you can specify the clean up policy as 
"stores.mystore.changelog.cleanup.policy=delete". Please refer to the [Kafka 
documentation](http://kafka.apache.org/documentation.html#configuration) for 
more topic level configurations.|
+|stores.**_store-name_**.<br>write.batch.size|500|For better write 
performance, the storage engine buffers writes and applies them to the 
underlying store in a batch. If the same key is written multiple times in quick 
succession, this buffer also deduplicates writes to the same key. This property 
is set to the number of key/value pairs that should be kept in this in-memory 
buffer, per task instance. The number cannot be greater than 
`stores.*.object.cache.size`.|
+|stores.**_store-name_**.<br>object.cache.size|1000|Samza maintains an 
additional cache in front of RocksDB for frequently-accessed objects. This 
cache contains deserialized objects (avoiding the deserialization overhead on 
cache hits), in contrast to the RocksDB block cache 
(`stores.*.container.cache.size.bytes`), which caches serialized objects. This 
property determines the number of objects to keep in Samza's cache, per task 
instance. This same cache is also used for write buffering (see 
`stores.*.write.batch.size`). A value of 0 disables all caching and batching.|
+|stores.**_store-name_**.container.<br>cache.size.bytes|104857600|The size of 
RocksDB's block cache in bytes, per container. If there are several task 
instances within one container, each is given a proportional share of this 
cache. Note that this is an off-heap memory allocation, so the container's 
total memory use is the maximum JVM heap size plus the size of this cache.|
+|stores.**_store-name_**.container.<br>write.buffer.size.bytes|33554432|The 
amount of memory (in bytes) that RocksDB uses for buffering writes before they 
are written to disk, per container. If there are several task instances within 
one container, each is given a proportional share of this buffer. This setting 
also determines the size of RocksDB's segment files.|
+|stores.**_store-name_**.<br>rocksdb.compression|`snappy`|This property 
controls whether RocksDB should compress data on disk and in the block cache. 
The following values are valid:<br><br>`snappy`<br>Compress data using the 
[Snappy](https://github.com/google/snappy) codec.<br><br>`bzip2`<br>Compress 
data using the [bzip2](https://en.wikipedia.org/wiki/Bzip2) 
codec.<br><br>`zlib`<br>Compress data using the 
[zlib](https://en.wikipedia.org/wiki/Zlib) codec.<br><br>`lz4`<br>Compress data 
using the [lz4](https://github.com/lz4/lz4) codec.<br><br>`lz4hc`<br>Compress 
data using the [lz4hc](https://github.com/lz4/lz4) (high compression) 
codec.<br><br>`none`<br>Do not compress data.|
+|stores.**_store-name_**.<br>rocksdb.block.size.bytes|4096|If compression is 
enabled, RocksDB groups approximately this many uncompressed bytes into one 
compressed block.|
+|stores.**_store-name_**.<br>rocksdb.compaction.style|`universal`|This 
property controls the compaction style that RocksDB will employ when compacting 
its levels. The following values are valid:<br><br>`universal`<br>Use 
[universal](https://github.com/facebook/rocksdb/wiki/Universal-Compaction) 
compaction.<br><br>`fifo`<br>Use 
[FIFO](https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style) 
compaction. <br><br>`level`<br>Use RocksDB's standard [leveled 
compaction](https://github.com/facebook/rocksdb/wiki/Leveled-Compaction).|
+|stores.**_store-name_**.<br>rocksdb.num.write.buffers|3|Configures the number 
of [write 
buffers](https://github.com/facebook/rocksdb/wiki/Basic-Operations#write-buffer)
 that a RocksDB store uses. This allows RocksDB to continue taking writes to 
other buffers even while a given write buffer is being flushed to disk.|
+|stores.**_store-name_**.<br>rocksdb.max.log.file.size.bytes|67108864|The 
maximum size in bytes of the RocksDB LOG file before it is rotated.|
+|stores.**_store-name_**.<br>rocksdb.keep.log.file.num|2|The number of RocksDB 
LOG files (including rotated LOG.old.* files) to keep.|
+|stores.**_store-name_**.<br>rocksdb.metrics.list|(none)|A list of [RocksDB 
properties](https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L409)
 to expose as metrics (gauges).|
+
+### <a name="deployment"></a>[5. Deployment](#deployment)
+Samza supports both standalone and clustered ([YARN](yarn-jobs.html)) 
[deployment models](../deployment/deployment-model.html). Below are the 
configurations options for both models.
+#### <a name="yarn-cluster-deployment"></a>[5.1 YARN Cluster 
Deployment](#yarn-cluster-deployment)
+|Name|Default|Description|
+|--- |--- |--- |
+|yarn.package.path| |Required for YARN jobs: The URL from which the job 
package can be downloaded, for example a http:// or hdfs:// URL. The job 
package is a .tar.gz file with a specific directory structure.|
+|job.container.count|1|The number of YARN containers to request for running 
your job. This is the main parameter for controlling the scale (allocated 
computing resources) of your job: to increase the parallelism of processing, 
you need to increase the number of containers. The minimum is one container, 
and the maximum number of containers is the number of task instances (usually 
the number of input stream partitions). Task instances are evenly distributed 
across the number of containers that you specify.|
+|cluster-manager.container.memory.mb|1024|How much memory, in megabytes, to 
request from the cluster manager per container of your job. Along with 
cluster-manager.container.cpu.cores, this property determines how many 
containers the cluster manager will run on one machine. If the container 
exceeds this limit, it will be killed, so it is important that the container's 
actual memory use remains below the limit. The amount of memory used is 
normally the JVM heap size (configured with task.opts), plus the size of any 
off-heap memory allocation (for example stores.*.container.cache.size.bytes), 
plus a safety margin to allow for JVM overheads.|
+|cluster-manager.container.cpu.cores|1|The number of CPU cores to request per 
container of your job. Each node in the cluster has a certain number of CPU 
cores available, so this number (along with 
cluster-manager.container.memory.mb) determines how many containers can be run 
on one machine.|
+
+##### <a name="advanced-cluster-configurations"></a>[5.1.1 Advanced Cluster 
Configurations](#advanced-cluster-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|cluster-manager.container.retry.count|8|If a container fails, it is 
automatically restarted by Samza. However, if a container keeps failing shortly 
after startup, that indicates a deeper problem, so we should kill the job 
rather than retrying indefinitely. This property determines the maximum number 
of times we are willing to restart a failed container in quick succession (the 
time period is configured with `cluster-manager.container.retry.window.ms`). 
Each container in the job is counted separately. If this property is set to 0, 
any failed container immediately causes the whole job to fail. If it is set to 
a negative number, there is no limit on the number of retries.|
+|cluster-manager.container.retry.window.ms|300000|This property determines how 
frequently a container is allowed to fail before we give up and fail the job. 
If the same container has failed more than 
`cluster-manager.container.retry.count` times, and the time between failures 
was less than this property `cluster-manager.container.retry.window.ms` (in 
milliseconds), then we fail the job. There is no limit to the number of times 
we will restart a container if the time between failures is greater than 
`cluster-manager.container.retry.window.ms`.|
+|cluster-manager.jobcoordinator.jmx.enabled|true|Determines whether a JMX 
server should be started on the job's JobCoordinator. (true or false).|
+|cluster-manager.allocator.sleep.ms|3600|The container allocator thread is 
responsible for matching requests to allocated containers. The sleep interval 
for this thread is configured using this property.|
+|cluster-manager.container.request.timeout.ms|5000|The allocator thread 
periodically checks the state of the container requests and allocated 
containers to determine the assignment of a container to an allocated resource. 
This property determines the number of milliseconds before a container request 
is considered to have expired / timed-out. When a request expires, it gets 
allocated to any available container that was returned by the cluster manager.|
+|task.execute|bin/run-container.sh|The command that starts a Samza container. 
The script must be included in the [job package](./packaging.html). There is 
usually no need to customize this.|
+|task.java.home| |The JAVA_HOME path for Samza containers. By setting this 
property, you can use a java version that is different from your cluster's java 
version. Remember to set the `yarn.am.java.home` as well.|
+|yarn.am.container.<br>memory.mb|1024|Each Samza job when running in Yarn has 
one special container, the [ApplicationMaster](../yarn/application-master.html) 
(AM), which manages the execution of the job. This property determines how much 
memory, in megabytes, to request from YARN for running the ApplicationMaster.|
+|yarn.am.opts| |Any JVM options to include in the command line when executing 
the Samza [ApplicationMaster](../yarn/application-master.html). For example, 
this can be used to set the JVM heap size, to tune the garbage collector, or to 
enable remote debugging.|
+|yarn.am.java.home| |The `JAVA_HOME` path for Samza AM. By setting this 
property, you can use a java version that is different from your cluster's java 
version. Remember to set the `task.java.home` as well.<br>Example: 
`yarn.am.java.home=/usr/java/jdk1.8.0_05`|
+|yarn.am.poll.interval.ms|100|The Samza ApplicationMaster sends regular 
heartbeats to the YARN ResourceManager to confirm that it is alive. This 
property determines the time (in milliseconds) between heartbeats.|
+|yarn.queue| |Determines which YARN queue will be used for Samza job.|
+|yarn.kerberos.principal| |Principal the Samza job uses to authenticate itself 
into KDC, when running on a Kerberos enabled YARN cluster.|
+|yarn.kerberos.keytab| |The full path to the file containing keytab for the 
principal, specified by `yarn.kerberos.principal`. The keytab file is uploaded 
to the staging directory unique to each application on HDFS and the Application 
Master then uses the keytab and principal to periodically logs in to recreate 
the delegation tokens.|
+|yarn.job.view.acl| |This is for secured YARN cluster only. The 'viewing' acl 
of the YARN application that controls who can view the application (e.g. 
application status, logs). See 
[ApplicationAccessType](https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/api/records/ApplicationAccessType.html)
 for more details.|
+|yarn.job.modify.acl| |This is for secured YARN cluster only. The 'modify' acl 
of the YARN application that controls who can modify the application (e.g. 
killing the application). See 
[ApplicationAccessType](https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/api/records/ApplicationAccessType.html)
 for more details.|
+|yarn.token.renewal.interval.seconds|24*3600|he time interval by which the 
Application Master re-authenticates and renew the delegation tokens. This value 
should be smaller than the length of time a delegation token is valid on hadoop 
namenodes before expiration.|
+|yarn.resources.**_resource-name_**.path| |    The path for localizing the 
resource for **_resource-name_**. The scheme (e.g. http, ftp, hdsf, file, etc) 
in the path should be configured in YARN core-site.xml as fs.<scheme>.impl and 
is associated with a 
[FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html).
 If defined, the resource will be localized in the Samza application directory 
before the Samza job runs. More details can be found 
[here](../yarn/yarn-resource-localization.html).|
+|yarn.resources.**_resource-name_**.local.name|**_resource-name_**|The new 
local name for the resource after localization. This configuration only applies 
when `yarn.resources.resource-name.path` is configured.|
+|yarn.resources.**_resource-name_**.local.type|FILE|The type for the resource 
after localization. It can be ARCHIVE (archived directory), FILE, or PATTERN 
(the entries extracted from the archive with the pattern). This configuration 
only applies when `yarn.resources.resource-name.path` is configured.|
+|yarn.resources.**_resource-name_**.local.visibility|APPLICATION|The 
visibility for the resource after localization. It can be PUBLIC (visible to 
everyone), PRIVATE (visible to all Samza applications of the same account user 
as this application), or APPLICATION (visible to only this Samza application). 
`This configuration only applies when yarn.resources.resource-name.path` is 
configured.|
+
+#### <a name="standalone-deployment"></a>[5.2 Standalone 
Deployment](#standalone-deployment)
+|Name|Default|Description|
+|--- |--- |--- |
+|app.runner.class|`org.apache.samza.runtime.`<br>`RemoteApplicationRunner`|This
 sets the means of executing the Samza application at runtime. The value is a 
fully-qualified Java classname, which implements 
[ApplicationRunner](../api/javadocs/org/apache/samza/runtime/ApplicationRunner.html).
 Samza ships with two implementations:<br><br> 
`org.apache.samza.runtime.RemoteApplicationRunner`<br>Default option. Runs your 
application in a remote cluster, such as YARN. See [clustered 
deployment](#yarn-cluster-deployment). 
<br><br>`org.apache.samza.runtime.LocalApplicationRunner`<br>This class is 
__required__ to be set to run the application on a local standalone 
environment. See [standalone deployment](#standalone-deployment).|
+|job.coordinator.factory|`org.apache.samza.zk.`<br>`ZkJobCoordinatorFactory`|The
 fully-qualified name of the Java class which determines the factory class 
which will build the JobCoordinator. The user can specify a custom 
implementation of the `JobCoordinatorFactory` where a custom logic is 
implemented for distributed coordination of stream processors. <br> Samza 
supports the following coordination modes out of the box: 
<br><br>`org.apache.samza.standalone.PassthroughJobCoordinatorFactory`<br>Fixed 
partition mapping. No Zoookeeper. 
<br><br>`org.apache.samza.zk.ZkJobCoordinatorFactory`<br>Zookeeper-based 
coordination. 
<br><br>`org.apache.samza.AzureJobCoordinatorFactory`<br>Azure-based 
coordination<br><br> __Required__ only for non-cluster-managed applications.|
+|job.coordinator.zk.connect| |__Required__ for applications with 
Zookeeper-based coordination. Zookeeper coordinates (in "host:port[/znode]" 
format) to be used for coordination.
+|azure.storage.connect| |__Required__ for applications with Azure-based 
coordination. This is the storage connection string related to every Azure 
account. It is of the format: 
"DefaultEndpointsProtocol=https;AccountName=<Insert your account 
name>;AccountKey=<Insert your account key>"|
+
+##### <a name="advanced-standalone-configurations"></a>[5.2.1 Advanced 
Standalone Configurations](#advanced-standalone-configurations)
+|Name|Default|Description|
+|--- |--- |--- |
+|job.coordinator.azure.blob.length|5120000|Used for azure-based job 
coordination (`org.apache.samza.AzureJobCoordinatorFactory`). Length in bytes, 
of the page blob on which the leader stores the shared data. Different types of 
data is stored on different pages with predefined lengths. The offsets of these 
pages are dependent on the total page blob length.|
+|job.coordinator.zk.session.timeout.ms|30000|Zookeeper session timeout for all 
the ZK connections in milliseconds. Session timeout controls how long zk client 
will wait before throwing an exception, when it cannot talk to one of ZK 
servers.|
+|job.coordinator.zk.connection.timeout.ms|60000|Zookeeper connection timeout 
in milliseconds. Zk connection timeout controls how long client tries to 
connect to ZK server before giving up.|
+|job.coordinator.zk.consensus.timeout.ms|40000|Zookeeper-based coordination. 
How long each processor will wait for all the processors to report acceptance 
of the new job model before rolling back.|
+|job.debounce.time.ms|20000|Zookeeper-based coordination. How long the Leader 
processor will wait before recalculating the JobModel on change of registered 
processors.|
+
+### <a name="metrics"></a>[6. Metrics](#metrics)
+|Name|Default|Description|
+|--- |--- |--- |
+|metrics.reporter.**_reporter-name_**.class| |Samza automatically tracks 
various metrics which are useful for monitoring the health of a job, and you 
can also track your own metrics. With this property, you can define any number 
of metrics reporters which send the metrics to a system of your choice (for 
graphing, alerting etc). You give each reporter an arbitrary reporter-name. To 
enable the reporter, you need to reference the reporter-name in 
metrics.reporters. The value of this property is the fully-qualified name of a 
Java class that implements MetricsReporterFactory. Samza ships with these 
implementations by default: 
<br><br>`org.apache.samza.metrics.reporter.JmxReporterFactory`<br>With this 
reporter, every container exposes its own metrics as JMX MBeans. The JMX server 
is started on a random port to avoid collisions between containers running on 
the same 
machine.<br><br>`org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`<br>This
 reporter sends the latest values o
 f all metrics as messages to an output stream once per minute. The output 
stream is configured with metrics.reporter.*.stream and it can use any system 
supported by Samza.|
+|metrics.reporters| |If you have defined any metrics reporters with 
metrics.reporter.*.class, you need to list them here in order to enable them. 
The value of this property is a comma-separated list of reporter-name tokens.|
+|metrics.reporter.**_reporter-name_**.stream| |If you have registered the 
metrics reporter metrics.reporter.*.class = 
`org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`, you need to 
set this property to configure the output stream to which the metrics data 
should be sent. The stream is given in the form system-name.stream-name, and 
the system must be defined in the job configuration. It's fine for many 
different jobs to publish their metrics to the same metrics stream. Samza 
defines a simple JSON encoding for metrics; in order to use this encoding, you 
also need to configure a serde for the metrics stream: 
<br><br>streams.*.samza.msg.serde = `metrics-serde` (replacing the asterisk 
with the stream-name of the metrics stream) 
<br>serializers.registry.metrics-serde.class = 
`org.apache.samza.serializers.MetricsSnapshotSerdeFactory` (registering the 
serde under a serde-name of metrics-serde)|
+|metrics.reporter.reporter-name.interval|60|If you have registered the metrics 
reporter `metrics.reporter.*.class` = 
`org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`, you can use 
this property to configure how frequently the reporter will report the metrics 
registered with it. The value for this property should be length of the 
interval between consecutive metric reporting. This value is in seconds, and 
should be a positive integer value. This property is optional and set to 60 by 
default, which means metrics will be reported every 60 seconds.|
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/6b318943/docs/learn/documentation/versioned/operations/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/operations/monitoring.md 
b/docs/learn/documentation/versioned/operations/monitoring.md
index ccd2442..d93e497 100644
--- a/docs/learn/documentation/versioned/operations/monitoring.md
+++ b/docs/learn/documentation/versioned/operations/monitoring.md
@@ -30,10 +30,10 @@ Like any other production software, it is critical to 
monitor the health of our
 
 * [A. Metrics Reporters](#a-metrics-reporters)
   + [A.1 Reporting Metrics to JMX (JMX Reporter)](#jmxreporter)
-    + [Enabling the JMX Reporter](#enablejmxreporter)
-    - [Using the JMX Reporter](#jmxreporter)
+      - [Enabling the JMX Reporter](#enablejmxreporter)
+      - [Using the JMX Reporter](#jmxreporter)
   + [A.2 Reporting Metrics to Kafka (MetricsSnapshot 
Reporter)](#snapshotreporter)
-    - [Enabling the MetricsSnapshot Reporter](#enablesnapshotreporter)
+      - [Enabling the MetricsSnapshot Reporter](#enablesnapshotreporter)
   + [A.3 Creating a Custom MetricsReporter](#customreporter)
 * [B. Metric Types in Samza](#metrictypes)
 * [C. Adding User-Defined Metrics](#userdefinedmetrics)

[1/2] samza git commit: Samza config reference

Reply via email to