[43/50] samza git commit: Cleanup docs for HDFS connector

2018-11-27 Thread jagadish
Cleanup docs for HDFS connector


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/f8470b1e
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/f8470b1e
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/f8470b1e

Branch: refs/heads/master
Commit: f8470b1ed796d00888e4ba176a0105c9b07b2938
Parents: ed196c7
Author: Jagadish 
Authored: Fri Nov 2 17:33:26 2018 -0700
Committer: Jagadish 
Committed: Fri Nov 2 17:33:26 2018 -0700

--
 .../documentation/versioned/connectors/hdfs.md  | 134 +++
 1 file changed, 50 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/samza/blob/f8470b1e/docs/learn/documentation/versioned/connectors/hdfs.md
--
diff --git a/docs/learn/documentation/versioned/connectors/hdfs.md 
b/docs/learn/documentation/versioned/connectors/hdfs.md
index 9692d18..9b79f24 100644
--- a/docs/learn/documentation/versioned/connectors/hdfs.md
+++ b/docs/learn/documentation/versioned/connectors/hdfs.md
@@ -21,133 +21,99 @@ title: HDFS Connector
 
 ## Overview
 
-Samza applications can read and process data stored in HDFS. Likewise, you can 
also write processed results to HDFS.
-
-### Environment Requirement
-
-Your job needs to run on the same YARN cluster which hosts the HDFS you want 
to consume from (or write into).
+The HDFS connector allows your Samza jobs to read data stored in HDFS files. 
Likewise, you can write processed results to HDFS. 
+To interact with HDFS, Samza requires your job to run on the same YARN cluster.
 
 ## Consuming from HDFS
+### Input Partitioning
 
-You can configure your Samza job to read from HDFS files with the 
[HdfsSystemConsumer](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java).
 Avro encoded records are supported out of the box and it is easy to extend to 
support other formats (plain text, csv, json etc). See Event Format section 
below.
-
-### Partitioning
+Partitioning works at the level of individual directories and files. Each 
directory is treated as its own stream and each of its files is treated as a 
_partition_. For example, Samza creates 5 partitions when it's reading from a 
directory containing 5 files. There is no way to parallelize the consumption 
when reading from a single file - you can only have one container to process 
the file.
 
-Partitioning works at the level of individual directories and files. Each 
directory is treated as its own stream, while each of its files is treated as a 
partition. For example, when reading from a directory on HDFS with 10 files, 
there will be 10 partitions created. This means that you can have up-to 10 
containers to process them. If you want to read from a single HDFS file, there 
is currently no way to break down the consumption - you can only have one 
container to process the file.
+### Input Event format
+Samza supports avro natively, and it's easy to extend to other serialization 
formats. Each avro record read from HDFS is wrapped into a message-envelope. 
The 
[envelope](../api/javadocs/org/apache/samza/system/IncomingMessageEnvelope.html)
 contains these 3 fields:
 
-### Event format
+- The key, which is empty
 
-Samza's HDFS consumer wraps each avro record read from HDFS into a 
message-envelope. The 
[Envelope](../api/javadocs/org/apache/samza/system/IncomingMessageEnvelope.html)
 contains three fields of interest:
+- The value, which is set to the avro 
[GenericRecord](https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html)
 
-1. The key, which is empty
-2. The message, which is set to the avro 
[GenericRecord](https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html)
-3. The stream partition, which is set to the name of the HDFS file
+- The partition, which is set to the name of the HDFS file
 
-To support input formats which are not avro, you can implement the 
[SingleFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java)
 interface (example: 
[AvroFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java))
+To support non-avro input formats, you can implement the 
[SingleFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java)
 interface.
 
-### End of stream support
+### EndOfStream
 
-While streaming sources like Kafka are unbounded, files on HDFS have finite 
data and have a notion of end-of-file.
+While streaming sources like Kafka are unbounded, files on HDFS have finite 
data and have

[7/9] samza git commit: Cleanup docs for HDFS connector

2018-11-13 Thread jagadish
Cleanup docs for HDFS connector

Author: Jagadish 

Reviewers: Jagadish

Closes #793 from vjagadish1989/website-reorg30


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/3e397022
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/3e397022
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/3e397022

Branch: refs/heads/1.0.0
Commit: 3e397022a5a54630d21a1cbbc0c273016592a0c2
Parents: ac5f948
Author: Jagadish 
Authored: Fri Nov 2 17:35:20 2018 -0700
Committer: Jagadish 
Committed: Tue Nov 13 19:33:26 2018 -0800

--
 .../documentation/versioned/connectors/hdfs.md  | 134 +++
 1 file changed, 50 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/samza/blob/3e397022/docs/learn/documentation/versioned/connectors/hdfs.md
--
diff --git a/docs/learn/documentation/versioned/connectors/hdfs.md 
b/docs/learn/documentation/versioned/connectors/hdfs.md
index 9692d18..9b79f24 100644
--- a/docs/learn/documentation/versioned/connectors/hdfs.md
+++ b/docs/learn/documentation/versioned/connectors/hdfs.md
@@ -21,133 +21,99 @@ title: HDFS Connector
 
 ## Overview
 
-Samza applications can read and process data stored in HDFS. Likewise, you can 
also write processed results to HDFS.
-
-### Environment Requirement
-
-Your job needs to run on the same YARN cluster which hosts the HDFS you want 
to consume from (or write into).
+The HDFS connector allows your Samza jobs to read data stored in HDFS files. 
Likewise, you can write processed results to HDFS. 
+To interact with HDFS, Samza requires your job to run on the same YARN cluster.
 
 ## Consuming from HDFS
+### Input Partitioning
 
-You can configure your Samza job to read from HDFS files with the 
[HdfsSystemConsumer](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java).
 Avro encoded records are supported out of the box and it is easy to extend to 
support other formats (plain text, csv, json etc). See Event Format section 
below.
-
-### Partitioning
+Partitioning works at the level of individual directories and files. Each 
directory is treated as its own stream and each of its files is treated as a 
_partition_. For example, Samza creates 5 partitions when it's reading from a 
directory containing 5 files. There is no way to parallelize the consumption 
when reading from a single file - you can only have one container to process 
the file.
 
-Partitioning works at the level of individual directories and files. Each 
directory is treated as its own stream, while each of its files is treated as a 
partition. For example, when reading from a directory on HDFS with 10 files, 
there will be 10 partitions created. This means that you can have up-to 10 
containers to process them. If you want to read from a single HDFS file, there 
is currently no way to break down the consumption - you can only have one 
container to process the file.
+### Input Event format
+Samza supports avro natively, and it's easy to extend to other serialization 
formats. Each avro record read from HDFS is wrapped into a message-envelope. 
The 
[envelope](../api/javadocs/org/apache/samza/system/IncomingMessageEnvelope.html)
 contains these 3 fields:
 
-### Event format
+- The key, which is empty
 
-Samza's HDFS consumer wraps each avro record read from HDFS into a 
message-envelope. The 
[Envelope](../api/javadocs/org/apache/samza/system/IncomingMessageEnvelope.html)
 contains three fields of interest:
+- The value, which is set to the avro 
[GenericRecord](https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html)
 
-1. The key, which is empty
-2. The message, which is set to the avro 
[GenericRecord](https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html)
-3. The stream partition, which is set to the name of the HDFS file
+- The partition, which is set to the name of the HDFS file
 
-To support input formats which are not avro, you can implement the 
[SingleFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java)
 interface (example: 
[AvroFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java))
+To support non-avro input formats, you can implement the 
[SingleFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java)
 interface.
 
-### End of stream support
+### EndOfStream
 
-While streaming sources like Kafka are unbounded, files on HDFS have finite 
data and have a notion of end-of-file.
+Wh

samza git commit: Cleanup docs for HDFS connector

2018-11-02 Thread jagadish
Repository: samza
Updated Branches:
  refs/heads/master 743903272 -> 859f1b646


Cleanup docs for HDFS connector

Author: Jagadish 

Reviewers: Jagadish

Closes #793 from vjagadish1989/website-reorg30


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/859f1b64
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/859f1b64
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/859f1b64

Branch: refs/heads/master
Commit: 859f1b646a75d499405d470116a227d83a5d506d
Parents: 7439032
Author: Jagadish 
Authored: Fri Nov 2 17:35:20 2018 -0700
Committer: Jagadish 
Committed: Fri Nov 2 17:35:20 2018 -0700

--
 .../documentation/versioned/connectors/hdfs.md  | 134 +++
 1 file changed, 50 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/samza/blob/859f1b64/docs/learn/documentation/versioned/connectors/hdfs.md
--
diff --git a/docs/learn/documentation/versioned/connectors/hdfs.md 
b/docs/learn/documentation/versioned/connectors/hdfs.md
index 9692d18..9b79f24 100644
--- a/docs/learn/documentation/versioned/connectors/hdfs.md
+++ b/docs/learn/documentation/versioned/connectors/hdfs.md
@@ -21,133 +21,99 @@ title: HDFS Connector
 
 ## Overview
 
-Samza applications can read and process data stored in HDFS. Likewise, you can 
also write processed results to HDFS.
-
-### Environment Requirement
-
-Your job needs to run on the same YARN cluster which hosts the HDFS you want 
to consume from (or write into).
+The HDFS connector allows your Samza jobs to read data stored in HDFS files. 
Likewise, you can write processed results to HDFS. 
+To interact with HDFS, Samza requires your job to run on the same YARN cluster.
 
 ## Consuming from HDFS
+### Input Partitioning
 
-You can configure your Samza job to read from HDFS files with the 
[HdfsSystemConsumer](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/HdfsSystemConsumer.java).
 Avro encoded records are supported out of the box and it is easy to extend to 
support other formats (plain text, csv, json etc). See Event Format section 
below.
-
-### Partitioning
+Partitioning works at the level of individual directories and files. Each 
directory is treated as its own stream and each of its files is treated as a 
_partition_. For example, Samza creates 5 partitions when it's reading from a 
directory containing 5 files. There is no way to parallelize the consumption 
when reading from a single file - you can only have one container to process 
the file.
 
-Partitioning works at the level of individual directories and files. Each 
directory is treated as its own stream, while each of its files is treated as a 
partition. For example, when reading from a directory on HDFS with 10 files, 
there will be 10 partitions created. This means that you can have up-to 10 
containers to process them. If you want to read from a single HDFS file, there 
is currently no way to break down the consumption - you can only have one 
container to process the file.
+### Input Event format
+Samza supports avro natively, and it's easy to extend to other serialization 
formats. Each avro record read from HDFS is wrapped into a message-envelope. 
The 
[envelope](../api/javadocs/org/apache/samza/system/IncomingMessageEnvelope.html)
 contains these 3 fields:
 
-### Event format
+- The key, which is empty
 
-Samza's HDFS consumer wraps each avro record read from HDFS into a 
message-envelope. The 
[Envelope](../api/javadocs/org/apache/samza/system/IncomingMessageEnvelope.html)
 contains three fields of interest:
+- The value, which is set to the avro 
[GenericRecord](https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html)
 
-1. The key, which is empty
-2. The message, which is set to the avro 
[GenericRecord](https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html)
-3. The stream partition, which is set to the name of the HDFS file
+- The partition, which is set to the name of the HDFS file
 
-To support input formats which are not avro, you can implement the 
[SingleFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java)
 interface (example: 
[AvroFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/AvroFileHdfsReader.java))
+To support non-avro input formats, you can implement the 
[SingleFileHdfsReader](https://github.com/apache/samza/blob/master/samza-hdfs/src/main/java/org/apache/samza/system/hdfs/reader/SingleFileHdfsReader.java)
 interface.
 
-### End of stream support
+### EndOfStream
 
-While streaming sources like Kafka are