[2/2] samza git commit: Samza config reference

jagadish Fri, 05 Oct 2018 11:02:01 -0700

Samza config reference

vjagadish prateekm Added all the configs.
![screen shot 2018-10-03 at 3 55 36 
pm](https://user-images.githubusercontent.com/29577458/46444442-e46f8a80-c726-11e8-9fec-1917df5d0386.png)
![screen shot 2018-10-03 at 3 56 09 
pm](https://user-images.githubusercontent.com/29577458/46444443-e46f8a80-c726-11e8-8d76-959af79a2c69.png)
![screen shot 2018-10-03 at 3 56 43 
pm](https://user-images.githubusercontent.com/29577458/46444444-e5082100-c726-11e8-8c8c-5ac03d780307.png)
![screen shot 2018-10-03 at 3 57 06 
pm](https://user-images.githubusercontent.com/29577458/46444445-e5082100-c726-11e8-84cc-693d5e31f631.png)


Author: Daniel Chen <dch...@linkedin.com>

Reviewers: Jagadish<jagad...@apache.org>, Prateek M<pmahe...@linkedin.com>

Closes #690 from dxichen/samza-config-reference


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/6b318943
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/6b318943
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/6b318943

Branch: refs/heads/master
Commit: 6b318943673253e6c66676073c73517fa1bdc425
Parents: b101ff9
Author: Daniel Chen <dch...@linkedin.com>
Authored: Fri Oct 5 11:01:45 2018 -0700
Committer: Jagadish <jvenkatra...@linkedin.com>
Committed: Fri Oct 5 11:01:45 2018 -0700

----------------------------------------------------------------------
 docs/css/main.new.css                           |   6 +-
 .../versioned/jobs/basic-configurations.md      | 168 ----------
 .../versioned/jobs/configuration.md             |   4 +-
 .../versioned/jobs/samza-configurations.md      | 335 +++++++++++++++++++
 .../versioned/operations/monitoring.md          |   6 +-
 5 files changed, 342 insertions(+), 177 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/6b318943/docs/css/main.new.css
----------------------------------------------------------------------
diff --git a/docs/css/main.new.css b/docs/css/main.new.css
index 623ca55..9e255d9 100644
--- a/docs/css/main.new.css
+++ b/docs/css/main.new.css
@@ -2185,7 +2185,7 @@ ul.case-studies {
 }
 
 .meet-smoosh-content {
-  
+
 }
 
 @media only screen and (min-width:768px) {
@@ -2197,11 +2197,11 @@ ul.case-studies {
     width: 45%;
     margin-left: 5%;
   }
-  
+
   .meet-name-date-host-info {
     width: 50%;
   }
-  
+
 }
 
 

http://git-wip-us.apache.org/repos/asf/samza/blob/6b318943/docs/learn/documentation/versioned/jobs/basic-configurations.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/basic-configurations.md 
b/docs/learn/documentation/versioned/jobs/basic-configurations.md
deleted file mode 100644
index 1f222c5..0000000
--- a/docs/learn/documentation/versioned/jobs/basic-configurations.md
+++ /dev/null
@@ -1,168 +0,0 @@
----
-layout: page
-title: Basic Configurations
----
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
-The following table lists common properties that should be included in a Samza 
job configuration file.<br>
-The full list of configurations could be accessed in the [Configuration 
Table](configuration-table.html) page.
-
-
-* [Application Configurations](#application-configurations)
-* [JobCoordinator Configurations](#jobcoordinator-configurations)
-  + [Cluster Deployment](#cluster-deployment)
-  + [Standalone Deployment](#standalone-deployment)
-* [Storage Configurations](#storage-configuration)
-* [Checkpointing](#checkpointing)
-* [System & Streamn Configurations](#system--stream-configurations)
-  + [Kafka](#kafka)
-  + [HDFS](#hdfs)
-  + [EventHubs](#eventhubs)
-  + [Kinesis](#kinesis)
-  + [ElasticSearch](#elasticsearch)
-* [Metrics Configurations](#metrics-configuration)
-
-### Application Configurations
-These are the basic applications for setting up a Samza application.
-
-|Name|Default|Description|
-|--- |--- |--- |
-|app.name| |__Required:__ The name of your application.|
-|app.id|1|If you run several instances of your application at the same time, 
you need to give each instance a different app.id. This is important, since 
otherwise the applications will overwrite each others' checkpoints, and perhaps 
interfere with each other in other ways.|
-|app.class| |__Required:__ The application to run. The value is a 
fully-qualified Java classname, which must implement StreamApplication. A 
StreamApplication describes as a series of transformations on the streams.|
-|job.factory.class| |__Required:__ The job factory to use for running this 
job. <br> The value is a fully-qualified Java classname, which must implement 
StreamJobFactory.<br> Samza ships with three 
implementations:<br><br>`org.apache.samza.job.local.ThreadJobFactory`<br>Runs 
your job on your local machine using threads. This is intended only for 
development, not for production 
deployments.<br><br>`org.apache.samza.job.local.ProcessJobFactory`<br>Runs your 
job on your local machine as a subprocess. An optional command builderproperty 
can also be specified (see task.command.class for details). This is intended 
only for development,not for production 
deployments.<br><br>`org.apache.samza.job.yarn.YarnJobFactory`<br>Runs your job 
on a YARN grid. See below for YARN-specific configuration.|
-|job.name| |__Required:__ The name of your job. This name appears on the Samza 
dashboard, and it is used to tell apart this job's checkpoints from other jobs' 
checkpoints.|
-|job.id|1|If you run several instances of your job at the same time, you need 
to give each execution a different job.id. This is important, since otherwise 
the jobs will overwrite each others' checkpoints, and perhaps interfere with 
each other in other ways.|
-|job.coordinator.system| |__Required:__ The system-name to use for creating 
and maintaining the Coordinator Stream.|
-|job.default.system| |The system-name to access any input or output streams 
for which the system is not explicitly configured. This property is for input 
and output streams whereas job.coordinator.system is for samza metadata 
streams..|
-|job.container.count|1|The number of YARN containers to request for running 
your job. This is the main parameter for controlling the scale (allocated 
computing resources) of your job: to increase the parallelism of processing, 
you need to increase the number of containers. The minimum is one container, 
and the maximum number of containers is the number of task instances (usually 
the number of input stream partitions). Task instances are evenly distributed 
across the number of containers that you specify.|
-|job.changelog.system| |This property specifies a default system for 
changelog, which will be used with the stream specified in 
stores.store-name.changelog config. You can override this system by specifying 
both the system and the stream in stores.store-name.changelog.|
-|job.coordination.utils.factory|org.apache.samza.zk.<br>ZkCoordinationUtilsFactory|Class
 to use to create CoordinationUtils. Currently available values 
are:<br><br>`org.apache.samza.zk.ZkCoordinationUtilsFactory`<br>ZooKeeper based 
coordination 
utils.<br><br>`org.apache.samza.coordinator.AzureCoordinationUtilsFactory`<br>Azure
 based coordination utils.<br><br>These coordination utils are currently used 
for intermediate stream creation.|
-|task.class| |__Required:__ The fully-qualified name of the Java class which 
processes incoming messages from input streams. The class must implement 
[StreamTask](../api/javadocs/org/apache/samza/task/StreamTask.html) or 
[AsyncStreamTask](../api/javadocs/org/apache/samza/task/AsyncStreamTask.html), 
and may optionally implement 
[InitableTask](../api/javadocs/org/apache/samza/task/InitableTask.html), 
[ClosableTask](../api/javadocs/org/apache/samza/task/ClosableTask.html) and/or 
[WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html). 
The class will be instantiated several times, once for every input stream 
partition.|
-|task.window.ms|-1|If task.class implements 
[WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html), it 
can receive a windowing callback in regular intervals. This property specifies 
the time between window() calls, in milliseconds. If the number is negative 
(the default), window() is never called. Note that Samza is 
[single-threaded](../container/event-loop.html), so a window() call will never  
occur concurrently with the processing of a message. If a message is being 
processed at the time when a window() call is due, the window() call occurs 
after the processing of the currentmessage has completed.|
-|task.commit.ms|60000|If task.checkpoint.factory is configured, this property 
determines how often a checkpoint is written. The value is the time between 
checkpoints, in milliseconds. The frequency of checkpointing affects failure 
recovery: if a container fails unexpectedly (e.g. due to crash or machine 
failure) and is restarted, it resumes processing at the last checkpoint. Any 
messages processed since the last checkpoint on the failed container are 
processed again. Checkpointing more frequently reduces the number of messages 
that may be processed twice, but also uses more resources.|
-|task.log4j.system| |Specify the system name for the StreamAppender. If this 
property is not specified in the config, Samza throws exception. (See [Stream 
Log4j Appender](logging.html#stream-log4j-appender)) Example: 
task.log4j.system=kafka|
-|serializers.registry.<br>**_serde-name_**.class| |Use this property to 
register a serializer/deserializer, which defines a way of encoding application 
objects as an array of bytes (used for messages in streams, and for data in 
persistent storage). You can give a serde any serde-name you want, and 
reference that name in properties like systems.*.samza.key.serde, 
systems.*.samza.msg.serde, streams.*.samza.key.serde, 
streams.*.samza.msg.serde, stores.*.key.serde and stores.*.msg.serde. The value 
of this property is the fully-qualified name of a Java class that implements 
SerdeFactory. Samza ships with several 
serdes:<br><br>`org.apache.samza.serializers.ByteSerdeFactory`<br>A no-op serde 
which passes through the undecoded byte 
array.<br><br>`org.apache.samza.serializers.ByteBufferSerdeFactory`<br>Encodes 
`java.nio.ByteBuffer` 
objects.<br><br>`org.apache.samza.serializers.IntegerSerdeFactory`<br>Encodes 
`java.lang.Integer` objects as binary (4 bytes fixed-length big-endian 
encoding).<b
 r><br>`org.apache.samza.serializers.StringSerdeFactory`<br>Encodes 
`java.lang.String` objects as 
UTF-8.<br><br>`org.apache.samza.serializers.JsonSerdeFactory`<br>Encodes nested 
structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This Serde 
enforces a dash-separated property naming convention, while JsonSerdeV2 
doesn't. This serde is primarily meant for Samza's internal usage, and is 
publicly available for backwards 
compatibility.<br><br>`org.apache.samza.serializers.JsonSerdeV2Factory`<br>Encodes
 nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: 
This Serde uses Jackson's default (camelCase) property naming convention. This 
serde should be preferred over JsonSerde, especially in High Level API, unless 
the dasherized naming convention is required (e.g., for backwards 
compatibility).<br><br>`org.apache.samza.serializers.LongSerdeFactory`<br>Encodes
 `java.lang.Long` as binary (8 bytes fixed-length big-endian 
encoding).<br><br>`org.apache.samza.se
 rializers.DoubleSerdeFactory`<br>Encodes `java.lang.Double` as binary (8 bytes 
double-precision float 
point).<br><br>`org.apache.samza.serializers.UUIDSerdeFactory`<br>Encodes 
`java.util.UUID` 
objects.<br><br>`org.apache.samza.serializers.SerializableSerdeFactory`<br>Encodes
 `java.io.Serializable` 
objects.<br><br>`org.apache.samza.serializers.MetricsSnapshotSerdeFactory`<br>Encodes
 `org.apache.samza.metrics.reporter.MetricsSnapshot` objects (which are used 
for reporting metrics) as 
JSON.<br><br>`org.apache.samza.serializers.KafkaSerdeFactory`<br>Adapter which 
allows existing `kafka.serializer.Encoder` and `kafka.serializer.Decoder` 
implementations to be used as Samza serdes. Set 
`serializers.registry.serde-name.encoder` and  
s`erializers.registry.serde-name.decoder` to the appropriate class names.|
-
-### JobCoordinator Configurations
-Samza supports both standalone and clustered ([YARN](yarn-jobs.html)) 
deployment models. Below is the configurations options for both models.
-##### Cluster Deployment
-|Name|Default|Description|
-|--- |--- |--- |
-|yarn.package.path| |Required for YARN jobs: The URL from which the job 
package can be downloaded, for example a http:// or hdfs:// URL. The job 
package is a .tar.gz file with a specific directory structure.|
-|cluster-manager.container.memory.mb|1024|How much memory, in megabytes, to 
request from the cluster manager per container of your job. Along with 
cluster-manager.container.cpu.cores, this property determines how many 
containers the cluster manager will run on one machine. If the container 
exceeds this limit, it will be killed, so it is important that the container's 
actual memory use remains below the limit. The amount of memory used is 
normally the JVM heap size (configured with task.opts), plus the size of any 
off-heap memory allocation (for example stores.*.container.cache.size.bytes), 
plus a safety margin to allow for JVM overheads.|
-|cluster-manager.container.cpu.cores|1|The number of CPU cores to request per 
container of your job. Each node in the cluster has a certain number of CPU 
cores available, so this number (along with 
cluster-manager.container.memory.mb) determines how many containers can be run 
on one machine.|
-
-##### Standalone Deployment
-|Name|Default|Description|
-|--- |--- |--- |
-|job.coordinator.factory| |Class to use for job coordination. Currently 
available values 
are:<br><br>`org.apache.samza.standalone.PassthroughJobCoordinatorFactory`<br>Fixed
 partition mapping. No Zoookeeper. 
<br><br>`org.apache.samza.zk.ZkJobCoordinatorFactory`<br>Zookeeper-based 
coordination. 
<br><br>`org.apache.samza.AzureJobCoordinatorFactory`<br>Azure-based 
coordination<br><br> __Required__ only for non-cluster-managed applications.|
-|job.coordinator.zk.connect| |__Required__ for applications with 
Zookeeper-based coordination. Zookeeper coordinates (in "host:port[/znode]" 
format) to be used for coordination.
-|azure.storage.connect| |__Required__ for applications with Azure-based 
coordination. This is the storage connection string related to every Azure 
account. It is of the format: 
"DefaultEndpointsProtocol=https;AccountName=<Insert your account 
name>;AccountKey=<Insert your account key>"|
-
-### Storage Configuration
-These properties defines Samza's store mechanism for efficient [stateful 
stream processing](../container/state-management.html).
-
-|Name|Default|Description|
-|--- |--- |--- |
-|stores.**_store-name_**.factory| |You can give a store any **_store-name_** 
except `default` (`default` is reserved for defining default store parameters), 
and use that name to get a reference to the store in your stream task (call 
[TaskContext.getStore()](../api/javadocs/org/apache/samza/task/TaskContext.html#getStore(java.lang.String))
 in your task's 
[init()](../api/javadocs/org/apache/samza/task/InitableTask.html#init(org.apache.samza.config.Config,
 org.apache.samza.task.TaskContext)) method). The value of this property is the 
fully-qualified name of a Java class that implements 
[StorageEngineFactory](../api/javadocs/org/apache/samza/storage/StorageEngineFactory.html).
 Samza currently ships with one storage engine implementation: 
<br><br>`org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory` 
<br>An on-disk storage engine with a key-value interface, implemented using 
[RocksDB](http://rocksdb.org/). It supports fast random-access reads and 
writes, as well as range queri
 es on keys. RocksDB can be configured with various additional tuning 
parameters.|
-|stores.**_store-name_**.key.serde| |If the storage engine expects keys in the 
store to be simple byte arrays, this [serde](../container/serialization.html) 
allows the stream task to access the store using another object type as key. 
The value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, keys are passed 
unmodified to the storage engine (and the changelog stream, if appropriate).|
-|stores.**_store-name_**.msg.serde| |If the storage engine expects values in 
the store to be simple byte arrays, this 
[serde](../container/serialization.html) allows the stream task to access the 
store using another object type as value. The value of this property must be a 
serde-name that is registered with serializers.registry.*.class. If this 
property is not set, values are passed unmodified to the storage engine (and 
the changelog stream, if appropriate).|
-|stores.**_store-name_**.changelog| |Samza stores are local to a container. If 
the container fails, the contents of the store are lost. To prevent loss of 
data, you need to set this property to configure a changelog stream: Samza then 
ensures that writes to the store are replicated to this stream, and the store 
is restored from this stream after a failure. The value of this property is 
given in the form system-name.stream-name. The "system-name" part is optional. 
If it is omitted you must specify the system in job.changelog.system config. 
Any output stream can be used as changelog, but you must ensure that only one 
job ever writes to a given changelog stream (each instance of a job and each 
store needs its own changelog stream).|
-|stores.**_store-name_**.rocksdb.ttl.ms| |The time-to-live of the store. 
Please note it's not a strict TTL limit (removed only after compaction). Please 
use caution opening a database with and without TTL, as it might corrupt the 
database. Please make sure to read the 
[constraints](https://github.com/facebook/rocksdb/wiki/Time-to-Live) before 
using.|
-
-### System & Stream Configurations
-Samza consume and produce in [Streams](../container/streams.html) and has 
support variety of Systems including Kafka, HDFS, Azure EventHubs, Kinesis and 
ElasticSearch.
-
-|Name|Default|Description|
-|--- |--- |--- |
-|task.inputs| |__Required:__ A comma-separated list of streams that are 
consumed by this job. Each stream is given in the format 
system-name.stream-name. For example, if you have one input system called 
my-kafka, and want to consume two Kafka topics called PageViewEvent and 
UserActivityEvent, then you would set task.inputs=my-kafka.PageViewEvent, 
my-kafka.UserActivityEvent.|
-|systems.**_system-name_**.samza.factory| |__Required__: The fully-qualified 
name of a Java class which provides a system. A system can provide input 
streams which you can consume in your Samza job, or output streams to which you 
can write, or both. The requirements on a system are very flexible â it may 
connect to a message broker, or read and write files, or use a database, or 
anything else. The class must implement 
[SystemFactory](../api/javadocs/org/apache/samza/system/SystemFactory.html). 
Samza ships with the following implementations: 
<br><br>`org.apache.samza.system.kafka.KafkaSystemFactory` 
[(Configs)](#kafka)<br>`org.apache.samza.system.hdfs.HdfsSystemFactory` 
[(Configs)](#hdfs) <br>`org.apache.samza.system.eventhub.EventHubSystemFactory` 
[(Configs)](#eventhubs)<br>`org.apache.samza.system.kinesis.KinesisSystemFactory`
 
[(Configs)](#kinesis)<br>`org.apache.samza.system..elasticsearch.ElasticsearchSystemFactory`
 [(Configs)](#elasticsearch)
-|systems.**_system-name_**.default.stream.*| |A set of default properties for 
any stream associated with the system. For example, if 
"systems.kafka-system.default.stream.replication.factor"=2 was configured, then 
every Kafka stream created on the kafka-system will have a replication factor 
of 2 unless the property is explicitly overridden at the stream scope using 
streams properties.|
-|systems.**_system-name_**.default.stream.samza.key.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
key of messages on input streams, and to serialize the key of messages on 
output streams. This property defines the serde for an for all streams in the 
system. See the stream-scoped property to define the serde for an individual 
stream. If both are defined, the stream-level definition takes precedence. The 
value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
-|systems.**_system-name_**.default.stream.samza.msg.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
value of messages on input streams, and to serialize the value of messages on 
output streams. This property defines the serde for an for all streams in the 
system. See the stream-scoped property to define the serde for an individual 
stream. If both are defined, the stream-level definition takes precedence. The 
value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
-|systems.**_system-name_**.default.stream.samza.offset.default|`upcoming`|If a 
container starts up without a [checkpoint](../container/checkpointing.html),  
this property determines where in the input stream we should start consuming. 
The value must be an 
[OffsetType](../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html),
 one of the following: <br><br>`upcoming` <br>Start processing messages that 
are published after the job starts. Any messages published while the job was 
not running are not processed. <br><br>`oldest` <br>Start processing at the 
oldest available message in the system, and [reprocess](reprocessing.html) the 
entire available message history. <br><br>This property is for all streams 
within a system. To set it for an individual stream, see 
streams.stream-id.samza.offset.default. If both are defined, the stream-level 
definition takes precedence.|
-|streams.**_stream-id_**.samza.system| |The system-name of the system on which 
this stream will be accessed. This property binds the stream to one of the 
systems defined with the property systems.system-name.samza.factory. If this 
property isn't specified, it is inherited from job.default.system.|
-|streams.**_stream-id_**.samza.physical.name| |The physical name of the stream 
on the system on which this stream will be accessed. This is opposed to the 
stream-id which is the logical name that Samza uses to identify the stream. A 
physical name could be a Kafka topic name, an HDFS file URN or any other 
system-specific identifier.|
-|streams.**_stream-id_**.samza.key.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
key of messages on input streams, and to serialize the key of messages on 
output streams. This property defines the serde for an individual stream. See 
the system-scoped property to define the serde for all streams within a system. 
If both are defined, the stream-level definition takes precedence. The value of 
this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
-|streams.**_stream-id_**.samza.msg.serde| |The 
[serde](../container/serialization.html) which will be used to deserialize the 
value of messages on input streams, and to serialize the value of messages on 
output streams. This property defines the serde for an individual stream. See 
the system-scoped property to define the serde for all streams within a system. 
If both are defined, the stream-level definition takes precedence. The value of 
this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, messages are passed 
unmodified between the input stream consumer, the task and the output stream 
producer.|
-|streams.**_stream-id_**.samza.offset.default|`upcoming`|If a container starts 
up without a [checkpoint](../container/checkpointing.html), this property 
determines where in the input stream we should start consuming. The value must 
be an [OffsetType 
(../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html), 
one of the following: <br><br>`upcoming` <br>Start processing messages that are 
published after the job starts. Any messages published while the job was not 
running are not processed. <br><br>`oldest` <br>Start processing at the oldest 
available message in the system, and [reprocess](reprocessing.html) the entire 
available message history. <br><br>This property is for an individual stream. 
To set it for all streams within a system, see  
systems.system-name.samza.offset.default. If both are defined, the stream-level 
definition takes precedence.|
-|task.broadcast.inputs| |This property specifies the partitions that all tasks 
should consume. The systemStreamPartitions you put here will be sent to all the 
tasks. <br>Format: system-name.stream-name#partitionId or 
system-name.stream-name#[startingPartitionId-endingPartitionId] <br>Example: 
task.broadcast.inputs=mySystem.broadcastStream#[0-2], 
mySystem.broadcastStream#0|
-
-##### Kafka
-Configs for consuming and producing to [Apache 
Kafka](https://kafka.apache.org/). This section applies if you have set 
systems.*.samza.factory = `org.apache.samza.system.kafka.KafkaSystemFactory`
-Samples found [here](../../../../startup/hello-samza/versioned)
-
-|Name|Default|Description|
-|--- |--- |--- |
-|systems.**_system-name_**.consumer.zookeeper.connect| |The hostname and port 
of one or more Zookeeper nodes where information about the Kafka cluster can be 
found. This is given as a comma-separated list of hostname:port pairs, such as 
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181. If the cluster 
information is at some sub-path of the Zookeeper namespace, you need to include 
the path at the end of the list of hostnames, for example: 
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/clusters/my-kafka|
-|systems.**_system-name_**.consumer.auto.offset.reset|`largest`|This setting 
determines what happens if a consumer attempts to read an offset that is 
outside of the current valid range. This could happen if the topic does not 
exist, or if a checkpoint is older than the maximum message history retained by 
the brokers. This property is not to be confused with 
systems.*.samza.offset.default, which determines what happens if there is no 
checkpoint. The following are valid values for auto.offset.reset: 
<br><br>`smallest` <br>Start consuming at the smallest (oldest) offset 
available on the broker (process as much message history as available). 
<br><br>`largest` <br>Start consuming at the largest (newest) offset available 
on the broker (skip any messages published while the job was not running). 
<br><br>anything else <br>Throw an exception and refuse to start up the job.|
-|systems.**_system-name_**.producer.bootstrap.servers| | A list of network 
endpoints where the Kafka brokers are running. This is given as a 
comma-separated list of hostname:port pairs, for example 
kafka1.example.com:9092,kafka2.example.com:9092,kafka3.example.com:9092. It's 
not necessary to list every single Kafka node in the cluster: Samza uses this 
property in order to discover which topics and partitions are hosted on which 
broker. This property is needed even if you are only consuming from Kafka, and 
not writing to it, because Samza uses it to discover metadata about streams 
being consumed.|
-##### HDFS
-Configs for [consuming](../hadoop/consumer.html) and 
[producing](../hadoop/producer.html) to 
[HDFS](https://hortonworks.com/apache/hdfs/). This section applies if you have 
set systems.*.samza.factory = `org.apache.samza.system.hdfs.HdfsSystemFactory`
-More about batch processing [here](../hadoop/overview.html).
-
-|Name|Default|Description|
-|--- |--- |--- |
-|systems.**_system-name_**.producer.hdfs.base.output.dir|/user/USERNAME/SYSTEMNAME|The
 base output directory for HDFS writes. Defaults to the home directory of the 
user who ran the job, followed by the systemName for this HdfsSystemProducer as 
defined in the job.properties file.|
-|systems.**_system-name_**.producer.hdfs.writer.class|`org.apache.samza.system.hdfs.writer.`<br>`BinarySequenceFileHdfsWriter`|Fully-qualified
 class name of the HdfsWriter implementation this HDFS Producer system should 
use|
-|systems.**_system-name_**.stagingDirectory|Inherit from 
yarn.job.staging.directory if set|Staging directory for storing partition 
description. By default (if not set by users) the value is inherited from 
"yarn.job.staging.directory" internally. The default value is typically good 
enough unless you want explicitly use a separate location.|
-##### EventHubs
-Configs for consuming and producing to [Azure 
EventHubs](https://azure.microsoft.com/en-us/services/event-hubs/). This 
section applies if you have set systems.*.samza.factory = 
`org.apache.samza.system.eventhub.EventHubSystemFactory`
-Documentation and samples found [here](../azure/eventhubs.html)
-
-|Name|Default|Description|
-|--- |--- |--- |
-|systems.**_system-name_**.stream.list| |List of Samza **_stream-id_** used 
for the Eventhub system|
-|streams.**_stream-id_**.eventhubs.namespace| |Namespace of the associated 
stream-ids. __Required__ to access the Eventhubs entity per stream.|
-|streams.**_stream-id_**.eventhubs.entitypath| |Entity of the associated 
stream-ids. __Required__ to access the Eventhubs entity per stream.|
-|sensitive.streams.**_stream-id_**.eventhubs.sas.keyname| |SAS Keyname of the 
associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
-|sensitive.streams.**_stream-id_**.eventhubs.sas.token| |SAS Token the 
associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
-##### Kinesis
-Configs for consuming and producing to [Amazon 
Kinesis](https://aws.amazon.com/kinesis/). This section applies if you have set 
systems.*.samza.factory = `org.apache.samza.system.kinesis.KinesisSystemFactory`
-Documentation and samples found [here](../aws/kinesis.html)
-
-|Name|Default|Description|
-|--- |--- |--- |
-|systems.**_system-name_**.streams.**_stream-name_**.aws.region| |Region of 
the associated stream-name. __Required__ to access the Kinesis data stream.|
-|systems.**_system-name_**.streams.**_stream-name_**.aws.accessKey| |AccessKey 
of the associated stream-name. __Required__ to access Kinesis data stream.|
-|systems.**_system-name_**.streams.**_stream-name_**.aws.secretKey| |SecretKey 
of the associated stream-name. __Required__ to access the Kinesis data stream.|
-##### ElasticSearch
-Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elasticsearch). This section 
applies if you have set systems.*.samza.factory = 
`org.apache.samza.system..elasticsearch.ElasticsearchSystemFactory`
-
-|Name|Default|Description|
-|--- |--- |--- |
-|systems.**_system-name_**.client.factory| |__Required:__ The elasticsearch 
client factory used for connecting to the Elasticsearch cluster. Samza ships 
with the following 
implementations:<br><br>`org.apache.samza.system.elasticsearch.client.TransportClientFactory`<br>Creates
 a TransportClient that connects to the cluster remotely without joining it. 
This requires the transport host and port properties to be 
set.<br><br>`org.apache.samza.system.elasticsearch.client.NodeClientFactory`<br>Creates
 a Node client that connects to the cluster by joining it. By default this uses 
zen discovery to find the cluster but other methods can be configured.|
-|systems.**_system-name_**.index.request.factory|`org.apache.samza.system.`<br>`elasticsearch.indexrequest.`<br>`DefaultIndexRequestFactory`|The
 index request factory that converts the Samza OutgoingMessageEnvelope into the 
IndexRequest to be send to elasticsearch. The default 
[IndexRequestFactory](org.apache.samza.system.elasticsearch.indexrequest) 
behaves as follows:<br><br>`Stream name`<br>The stream name is of the format 
{index-name}/{type-name} which map on to the elasticsearch index and 
type.<br><br>`Message id`<br>If the message has a key this is set as the 
document id, otherwise Elasticsearch will generate one for each 
document.<br><br>`Partition id`<br>If the partition key is set then this is 
used as the Elasticsearch routing key.<br><br>`Message`<br>The message must be 
either a byte[] which is passed directly on to Elasticsearch, or a Map which is 
passed on to the Elasticsearch client which serialises it into a JSON String. 
Samza serdes are not currently supported.|
-|systems.**_system-name_**.client.transport.host| |__Required__ for 
TransportClientFactory; The hostname that the TransportClientFactory connects 
to.|
-|systems.**_system-name_**.client.transport.port| |__Required__ for 
TransportClientFactory; The port that the TransportClientFactory connects to.|
-
-### Checkpointing
-[Checkpointing](../container/checkpointing.html) is not required, but 
recommended for most jobs. If you don't configure checkpointing, and a job or 
container restarts, it does not remember which messages it has already 
processed. Without checkpointing, consumer behavior on startup is determined by 
the ...samza.offset.default setting. Checkpointing allows a job to start up 
where it previously left off.
-
-|Name|Default|Description|
-|--- |--- |--- |
-|task.checkpoint.factory| |To enable 
[checkpointing](../container/checkpointing.html), you must set this property to 
the fully-qualified name of a Java class that implements 
[CheckpointManagerFactory](../api/javadocs/org/apache/samza/checkpoint/CheckpointManagerFactory.html).
 Samza ships with two checkpoint managers by default: 
<br><br>`org.apache.samza.checkpoint.file.FileSystemCheckpointManagerFactory` 
<br>Writes checkpoints to files on the local filesystem. You can configure the 
file path with the task.checkpoint.path property. This is a simple option if 
your job always runs on the same machine. On a multi-machine cluster, this 
would require a network filesystem mount. 
<br><br>`org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory` 
<br>Writes checkpoints to a dedicated topic on a Kafka cluster. This is the 
recommended option if you are already using Kafka for input or output streams. 
Use the task.checkpoint.system property to configure which Kafka cluster to use 
for che
 ckpoints.|
-|task.checkpoint.system| |This property is required if you are using Kafka for 
checkpoints (task.checkpoint.factory = 
`org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory`). You must 
set it to the system-name of a Kafka system. The stream name (topic name) 
within that system is automatically determined from the job name and ID: 
__samza_checkpoint_${job.name}_${job.id} (with underscores in the job name and 
ID replaced by hyphens).|
-
-### Metrics Configuration
-|Name|Default|Description|
-|--- |--- |--- |
-|metrics.reporter.**_reporter-name_**.class| |Samza automatically tracks 
various metrics which are useful for monitoring the health of a job, and you 
can also track your own metrics. With this property, you can define any number 
of metrics reporters which send the metrics to a system of your choice (for 
graphing, alerting etc). You give each reporter an arbitrary reporter-name. To 
enable the reporter, you need to reference the reporter-name in 
metrics.reporters. The value of this property is the fully-qualified name of a 
Java class that implements MetricsReporterFactory. Samza ships with these 
implementations by default: 
<br><br>`org.apache.samza.metrics.reporter.JmxReporterFactory`<br>With this 
reporter, every container exposes its own metrics as JMX MBeans. The JMX server 
is started on a random port to avoid collisions between containers running on 
the same 
machine.<br><br>`org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`<br>This
 reporter sends the latest values o
 f all metrics as messages to an output stream once per minute. The output 
stream is configured with metrics.reporter.*.stream and it can use any system 
supported by Samza.|
-|metrics.reporters| |If you have defined any metrics reporters with 
metrics.reporter.*.class, you need to list them here in order to enable them. 
The value of this property is a comma-separated list of reporter-name tokens.|
-|metrics.reporter.**_reporter-name_**.stream| |If you have registered the 
metrics reporter metrics.reporter.*.class = 
`org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`, you need to 
set this property to configure the output stream to which the metrics data 
should be sent. The stream is given in the form system-name.stream-name, and 
the system must be defined in the job configuration. It's fine for many 
different jobs to publish their metrics to the same metrics stream. Samza 
defines a simple JSON encoding for metrics; in order to use this encoding, you 
also need to configure a serde for the metrics stream: 
<br><br>streams.*.samza.msg.serde = `metrics-serde` (replacing the asterisk 
with the stream-name of the metrics stream) 
<br>serializers.registry.metrics-serde.class = 
`org.apache.samza.serializers.MetricsSnapshotSerdeFactory` (registering the 
serde under a serde-name of metrics-serde)|
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/6b318943/docs/learn/documentation/versioned/jobs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/configuration.md 
b/docs/learn/documentation/versioned/jobs/configuration.md
index 3511469..4aac9bf 100644
--- a/docs/learn/documentation/versioned/jobs/configuration.md
+++ b/docs/learn/documentation/versioned/jobs/configuration.md
@@ -56,11 +56,9 @@ Configuration keys that absolutely must be defined for a 
Samza job are:
 * `task.class`
 * `task.inputs`
 
-See the [Basic Configurations](basic-configurations.html) page to get started.
-
 ### Configuration Keys
 
-A complete list of configuration keys can be found on the [Configuration 
Table](configuration-table.html) page.  Note
+A complete list of configuration keys can be found on the [Samza 
Configurations](samza-configurations.html) page.  Note
 that configuration keys prefixed with "sensitive." are treated specially, in 
that the values associated with such keys
 will be masked in logs and Samza's YARN ApplicationMaster UI.  This is to 
prevent accidental disclosure only; no
 encryption is done.

[2/2] samza git commit: Samza config reference

Reply via email to