[GitHub] kafka pull request #4319: [WIP] KAFKA-5142: Add Connect support for message ...

2017-12-12 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/4319

[WIP] KAFKA-5142: Add Connect support for message headers (KIP-145)

*NEW PROPOSAL FOR KIP-145... DO NOT MERGE*

Changed the Connect API and runtime to support message headers as described 
in KIP-145.

The new `Header` interface defines an immutable representation of a Kafka 
header (key-value pair) with support for the Connect value types and schemas. 
This interface provides methods for easily converting between many of the 
built-in primitive, structured, and logical data types.

The new `Headers` interface defines an ordered collection of headers and is 
used to track all headers associated with a `ConnectRecord` (and thus 
`SourceRecord` and `SinkRecord`). This does allow multiple headers with the 
same key. The `Headers` contains methods for adding, removing, finding, and 
modifying headers. Convenience methods allow connectors and transforms to 
easily use and modify the headers for a record.

A new `HeaderConverter` interface is also defined to enable the Connect 
runtime framework to be able to serialize and deserialize headers between the 
in-memory representation and Kafka’s byte[] representation. A new 
`SimpleHeaderConverter` implementation has been added, and this serializes to 
strings and deserializes by inferring the schemas (`Struct` header values are 
serialized without the schemas, so they can only be deserialized as `Map` 
instances without a schema.) The `StringConverter`, `JsonConverter`, and 
`ByteArrayConverter` have all been extended to also be `HeaderConverter` 
implementations. Each connector can be configured with a different header 
converter, although by default the `SimpleHeaderConverter` is used to serialize 
header values as strings without schemas.

Unit and integration tests are added for `ConnectHeader` and 
`ConnectHeaders`, the two implementation classes for headers. Additional test 
methods are added for the methods added to the `Converter` implementations. 
Finally, the `ConnectRecord` object is already used heavily, so only limited 
tests need to be added while quite a few of the existing tests already cover 
the changes.

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation matches KIP-145
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5142-b

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4319


commit 1c35692da19f3c8c92ce60946a69f576878b958a
Author: Randall Hauch 
Date:   2017-12-05T17:05:00Z

KAFKA-5142: Add message headers to Connect API (KIP-145)

Changed the Connect API to add message headers as described in KIP-145.

The new `Header` interface defines an immutable representation of a Kafka 
header (name-value pair) with support for the Connect value types and schemas. 
Kafka headers have a string name and a binary value, which doesn’t align well 
with Connect’s existing data and schema mechanisms. Thus, Connect’s 
`Header` interface provides methods for easily converting between many of the 
built-in primitive, structured, and logical data types. And, as discussed 
below, a new `HeaderConverter` interface is added to define how the Kafka 
header binary values are converted to Connect data objects.

The new `Headers` interface defines an ordered collection of headers and is 
used to track all headers associated with a `ConnectRecord`. Like the Kafka 
headers API, the Connect `Headers` interface allows storing multiple headers 
with the same key in an ordered list. The Connect `Headers` interface is 
mutable and has a number of methods that make it easy for connectors and 
transformations to add, modify, and remove headers from the record, and the 
interface is designed to allow chaining multiple mutating methods.

The existing constructors and methods in `ConnectRecord`, `SinkRecord`, and 
`SourceRecord` are unchanged to maintain backward compatibility, and in these 
situations the records will contain an empty `Headers` object that connectors 
and transforms can modify. There is also an additional constructor that allows 
an existing `Headers` to be passed in. A new overloaded form of `newRecord` 
method was created to allow connectors and transforms to create a new record 
with an entirely new `Headers` object.

A new `HeaderConverter` interface is also defined to enable the Connect 
runtime framework to be able to serialize and deserialize headers between the 
in-memory representation and Kafka’s byte

[GitHub] kafka pull request #4296: KAFKA-6313: Add SLF4J as direct dependency to Kafk...

2017-12-06 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/4296

KAFKA-6313: Add SLF4J as direct dependency to Kafka core

Recent changes are now directly using the SLF4J API, so we should have a 
direct dependency.

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-6313

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4296


commit eb38ce67d82a47f932ab8cf6b39b05e5887f5300
Author: Randall Hauch 
Date:   2017-12-06T15:07:45Z

KAFKA-6313: Add SLF4J as direct dependency to Kafka core




---


[GitHub] kafka pull request #4077: KAFKA-5142: Added support for record headers, reus...

2017-10-16 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/4077

KAFKA-5142: Added support for record headers, reusing Kafka client's 
interfaces

*This is still a work in progress and should not be merged.*

This is a proposed PR that implements most of 
[KIP-145](https://cwiki.apache.org/confluence/display/KAFKA/KIP-145+-+Expose+Record+Headers+in+Kafka+Connect)
 but with some changes. The Kafka client library's `Headers` and `Header` 
interfaces are used directly so as to minimize the overhead of converting 
instances to a Connect-specific object. However, a new `ConnectHeaders` class 
is proposed to provide a fluent builder for easily constructing headers either 
in source connectors or SMTs that need to add/remove/modify headers, and a 
reader utility component for reading header values and converting to primitives.

Note that KIP-145 is still undergoing discussions, so this is provided 
merely as one possible approach.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5142

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4077.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4077


commit 5f31191af94e2a324a694bc99e4639d968389ff2
Author: Randall Hauch 
Date:   2017-10-16T22:59:50Z

KAFKA-5142: Added support for record headers, reusing Kafka client's 
interfaces




---


[GitHub] kafka pull request #4011: KAFKA-5903: Added Connect metrics to the worker an...

2017-10-03 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/4011

KAFKA-5903: Added Connect metrics to the worker and distributed herder

Added metrics to the Connect worker and rebalancing metrics to the 
distributed herder.

This is built on top of #3987, and I can rebase this PR once that is merged.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5903

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4011


commit d4afe3d2d0ca0c313e70266dca104caa0564d3f3
Author: Randall Hauch 
Date:   2017-09-28T19:12:21Z

KAFKA-5990 Enable generation of metrics docs for Connect

A new mechanism was added recently to the Metrics framework to make it 
easier to generate the documentation. It uses a registry with a 
MetricsNameTemplate for each metric, and then those templates are used when 
creating the actual metrics. The metrics framework provides utilities that can 
generate the HTML documentation from the registry of templates.

This change moves the recently-added Connect metrics over to use these 
templates and to then generate the metric documentation for Connect.

commit d7bab224800c4d56d754b7121abb209df9715ca1
Author: Randall Hauch 
Date:   2017-10-02T20:37:05Z

KAFKA-5903 Use string-valued metrics for connector name, type, version, and 
status

Changed from the indicator metrics for connector and task status that 
reported the various boolean states to a single string-valued metric for 
status. Also added string-valued metrics for connector name, version, and type.

commit 75ffb38f32f552dbf667553f998a9cde7af6465f
Author: Randall Hauch 
Date:   2017-10-03T01:05:44Z

KAFKA-5903 Added Connect worker metrics

commit 24efd5f29fe4b23da0c7f9949361b14fad8fbc4e
Author: Randall Hauch 
Date:   2017-10-04T05:15:19Z

KAFKA-5904 Added Connect rebalancing metrics to distributed herder

Added the rebalancing metrics to the distributed herder.




---


[GitHub] kafka pull request #3987: KAFKA-5990: Enable generation of metrics docs for ...

2017-09-28 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3987

KAFKA-5990: Enable generation of metrics docs for Connect

A new mechanism was added recently to the Metrics framework to make it 
easier to generate the documentation. It uses a registry with a 
MetricsNameTemplate for each metric, and then those templates are used when 
creating the actual metrics. The metrics framework provides utilities that can 
generate the HTML documentation from the registry of templates.

This change moves the recently-added Connect metrics over to use these 
templates and to then generate the metric documentation for Connect.

This PR is based upon #3975 and can be rebased once that has been merged.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5990

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3987.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3987


commit 23aebcdf7797923865a771e7acac89e7e2572e3d
Author: Randall Hauch 
Date:   2017-09-26T15:28:12Z

KAFKA-5902 Added sink task metrics

commit 9824e329ef599c59ea4e7f60cf46ec907b516d90
Author: Randall Hauch 
Date:   2017-09-28T20:47:39Z

KAFKA-5902 Changed to measuring task processing lag behind consumer

Changed the `sink-record-lag-max` metric to be the maximum lag in terms of 
number of records that the sink task is behind the consumer's position for any 
topic partitions. This is not ideal, since often “lag” is defined to 
represent how far behind the task (or consumer) is relative to the end of the 
topic partition. However, the most recent offset for the topic partition is not 
easy to access in Connect.

commit d38dbde9f72926c68383729f9c80513879913cde
Author: Randall Hauch 
Date:   2017-09-28T19:12:21Z

KAFKA-5990 Enable generation of metrics docs for Connect

A new mechanism was added recently to the Metrics framework to make it 
easier to generate the documentation. It uses a registry with a 
MetricsNameTemplate for each metric, and then those templates are used when 
creating the actual metrics. The metrics framework provides utilities that can 
generate the HTML documentation from the registry of templates.

This change moves the recently-added Connect metrics over to use these 
templates and to then generate the metric documentation for Connect.




---


[GitHub] kafka pull request #3985: KAFKA-5987: Maintain order of metric tags in gener...

2017-09-28 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3985

KAFKA-5987: Maintain order of metric tags in generated documentation

The `MetricNameTemplate` is changed to used a `LinkedHashSet` to maintain 
the same order of the tags that are passed in. This tag order is then 
maintained when `Metrics.toHtmlTable` generates the MBean names for each of the 
metrics.

The `SenderMetricsRegistry` and `FetcherMetricsRegistry` both contain 
templates used in the producer and consumer, respectively, and these were 
changed to use a `LinkedHashSet` to maintain the order of the tags.

Before this change, the generated HTML documentation might use MBean names 
like the following and order them:

```

kafka.connect:type=sink-task-metrics,connector={connector},partition={partition},task={task},topic={topic}
kafka.connect:type=sink-task-metrics,connector={connector},task={task}
```
However, after this change, the documentation would use the following order:
```
kafka.connect:type=sink-task-metrics,connector={connector},task={task}

kafka.connect:type=sink-task-metrics,connector={connector},task={task},topic={topic},partition={partition}
```

This is more readable as the code that is creating the templates has 
control over the order of the tags.

Note that JMX MBean names use ObjectName that does not maintain order of 
the properties (tags), so this change should have no impact on the actual JMX 
MBean names used in the metrics.

cc @wushujames

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5987

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3985.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3985


commit e5379c999b03b1a10d3779e21659f4a7808dc53c
Author: Randall Hauch 
Date:   2017-09-28T20:07:34Z

KAFKA-5987 Maintain order of metric tags in generated documentation

The `MetricNameTemplate` is changed to used a `LinkedHashSet` to maintain 
the same order of the tags that are passed in. This tag order is then 
maintained when `Metrics.toHtmlTable` generates the MBean names for each of the 
metrics.

The `SenderMetricsRegistry` and `FetcherMetricsRegistry` both contain 
templates used in the producer and consumer, respectively, and these were 
changed to use a `LinkedHashSet` to maintain the order of the tags.




---


[GitHub] kafka pull request #3975: KAFKA-5902 Added sink task metrics

2017-09-27 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3975

KAFKA-5902 Added sink task metrics

Added Connect metrics specific to source tasks, and builds upon #3864 and 
#3911 that have already been merged into `trunk`, and #3959 that has yet to be 
merged.

I'll rebase this PR when the latter is merged.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5902

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3975


commit 147a5a4e59f95718ba7b46263cf51f8b81c7f6aa
Author: Randall Hauch 
Date:   2017-09-20T15:15:14Z

KAFKA-5901 Added source task metrics

commit 984314b822884e7dec719f75c8991b6df1c585fe
Author: Randall Hauch 
Date:   2017-09-26T15:28:12Z

KAFKA-5902 Added sink task metrics




---


[GitHub] kafka pull request #3959: KAFKA-5901 Added Connect metrics specific to sourc...

2017-09-25 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3959

KAFKA-5901 Added Connect metrics specific to source tasks

Added Connect metrics specific to source tasks. This PR is built on top of 
#3911, and when that is merged this can be rebased and merged.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3959


commit 2b96e8ba1f0cd1fd77898281c9d7dd3cb60c4834
Author: Randall Hauch 
Date:   2017-09-19T22:09:07Z

KAFKA-5900 Corrected Percentiles and Histogram metric components

Expanded upon the unit tests, and corrected several issues found during the 
expanded testing of these classes that don’t appear to have been used yet.

commit 7563c1414adc3343161c9382114baded06989d47
Author: Randall Hauch 
Date:   2017-09-14T23:17:31Z

KAFKA-5900 Added worker task metrics common to both sink and source tasks

commit 418ec59d941e38b947b44bee27439b51b06ccd24
Author: Randall Hauch 
Date:   2017-09-25T18:04:34Z

KAFKA-5900 Added JavaDoc and improved indentation

commit b55bfafbc8e840371fad42508608e3aa14e546d4
Author: Randall Hauch 
Date:   2017-09-20T15:15:14Z

KAFKA-5901 Added source task metrics




---


[GitHub] kafka pull request #3934: KAFKA-5954 Correct Connect REST API system test

2017-09-21 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3934

KAFKA-5954 Correct Connect REST API system test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5954

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3934.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3934


commit 2f6f949fdd32b6bc0cadc075589c0c3373b03f82
Author: Randall Hauch 
Date:   2017-09-21T14:38:17Z

KAFKA-5954 Correct Connect REST API system test




---


[GitHub] kafka pull request #3911: Kafka 5900

2017-09-19 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3911

Kafka 5900



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5900

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3911


commit 02da9b0ebaa9df70c0dfb34799d3a08b051b9570
Author: Randall Hauch 
Date:   2017-09-13T14:48:14Z

KAFKA-5899 Added Connect metrics for connectors

Added metrics for each connector using Kafka’s existing `Metrics` 
framework. Since this is the first of several changes to add several groups of 
metrics, this change starts by adding a very simple `ConnectMetrics` object 
that is owned by each worker and that makes it easy to define multiple groups 
of metrics, called `ConnectMetricGroup` objects. Each metric group maps to a 
JMX MBean, and each metric within the group maps to an MBean attribute.

Future PRs will build upon this simple pattern to add metrics for source 
and sink tasks, workers, and worker rebalances.

commit d6ea0d67cca2526b522490bf2e48c231cf4b60f0
Author: Randall Hauch 
Date:   2017-09-14T23:17:31Z

KAFKA-5900 Added worker task metrics common to both sink and source tasks

commit d1b796720d5cca20195449c432da76609e9ae56b
Author: Randall Hauch 
Date:   2017-09-19T22:09:07Z

KAFKA-5900 Corrected Percentiles and Histogram metric components

Expanded upon the unit tests, and corrected several issues found during the 
expanded testing of these classes that don’t appear to have been used yet.




---


[GitHub] kafka pull request #3864: KAFKA-5899 Added Connect metrics for connectors

2017-09-14 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3864

KAFKA-5899 Added Connect metrics for connectors

Added metrics for each connector using Kafka’s existing `Metrics` 
framework. Since this is the first of several changes to add several groups of 
metrics, this change starts by adding a very simple `ConnectMetrics` object 
that is owned by each worker and that makes it easy to define multiple groups 
of metrics, called `ConnectMetricGroup` objects. Each metric group maps to a 
JMX MBean, and each metric within the group maps to an MBean attribute.

Future PRs will build upon this simple pattern to add metrics for source 
and sink tasks, workers, and worker rebalances.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5899

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3864


commit 02da9b0ebaa9df70c0dfb34799d3a08b051b9570
Author: Randall Hauch 
Date:   2017-09-13T14:48:14Z

KAFKA-5899 Added Connect metrics for connectors

Added metrics for each connector using Kafka’s existing `Metrics` 
framework. Since this is the first of several changes to add several groups of 
metrics, this change starts by adding a very simple `ConnectMetrics` object 
that is owned by each worker and that makes it easy to define multiple groups 
of metrics, called `ConnectMetricGroup` objects. Each metric group maps to a 
JMX MBean, and each metric within the group maps to an MBean attribute.

Future PRs will build upon this simple pattern to add metrics for source 
and sink tasks, workers, and worker rebalances.




---


[GitHub] kafka pull request #3774: MINOR: Increase timeout of Zookeeper service in sy...

2017-08-31 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3774

MINOR: Increase timeout of Zookeeper service in system tests

The previous timeout was 10 seconds, but system test failures have occurred 
when Zookeeper has started after about 11 seconds. Increasing the timeout to 30 
seconds, since most of the time this extra time will not be required, and when 
it is it will prevent a failed system test.

In addition to merging to `trunk`, please backport to the `0.11.x` and 
`0.10.2.x` branches.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka 
MINOR-Increase-timeout-of-zookeeper-service-in-system-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3774


commit 83d508a0507124478c7648a8a3cd84102780d2a6
Author: Randall Hauch 
Date:   2017-08-31T21:41:26Z

MINOR: Increase timeout of Zookeeper service in system tests

The previous timeout was 10 seconds, but system test failures have occurred 
when Zookeeper has started after about 11 seconds. Increasing the timeout to 30 
seconds, since most of the time this extra time will not be required, and when 
it is it will prevent a failed system test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3717: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-22 Thread rhauch
Github user rhauch closed the pull request at:

https://github.com/apache/kafka/pull/3717


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3717: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-22 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3717

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets (0.10.2)

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

This change also backports the fix for KAFKA-4942, which was minimal that 
caused the new tests to fail.

**This is for the `0.10.2` branch; see #3662 for the equivalent and 
already-approved PR for `trunk` and #3672 for the equivalent and 
already-approved PR for the `0.11.0` branch.**

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5731-0.10.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3717


commit 4e39118ae302980e6d8b75cb09534172d99d5b7a
Author: Randall Hauch 
Date:   2017-08-12T00:42:06Z

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

commit 6f74ef8e89f1193a5deb66870022591d51eb6580
Author: Randall Hauch 
Date:   2017-08-14T19:11:08Z

KAFKA-5731 Corrected mock consumer behavior during rebalance

Corrects the test case added in the previous commit to properly revoke the 
existing partition assignments before adding new partition assigments.

commit 3c2531b1abdaf3cdaac3781a45597a616652ff1c
Author: Randall Hauch 
Date:   2017-08-14T19:11:45Z

KAFKA-5731 Added expected call that was missing in another test

commit 05567b1677e7f5a39ca0f20d86773c872193da0b
Author: Randall Hauch 
Date:   2017-08-14T22:24:35Z

KAFKA-5731 Improved log messages related to offset commits

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit 60eba0ce024f3a211f33bf20bc04febbebb7d1c4
Author: Randall Hauch 
Date:   2017-08-15T14:47:05Z

KAFKA-5731 More cleanup of log messages related to offset commits

commit ff123bfb910742e3a5c320fff6b23ff645ef62a2
Author: Randall Hauch 
Date:   2017-08-15T16:21:52Z

KAFKA-5731 More improvements to the log messages in WorkerSinkTask

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit d5f1b29a4cb41c139094cbed9b78cf51594f861c
Author: Randall Hauch 
Date:   2017-08-15T16:31:28Z

KAFKA-5731 Removed unnecessary log message

commit f2b02cda83876b7d55f331cf89d9a306ab2b467f
Author: Randall Hauch 
Date:   2017-08-15T17:54:16Z

KAFKA-5731 Additional tweaks to debug and trace log messages to ensure 
clarity and usefulness

commit fa427b7557b93a600e0007e1a4adfb4aa38f526b
Author: Randall Hauch 
Date:   2017-08-15T19:30:09Z

KAFKA-5731 Use the correct value in trace messages

commit 957f0acac6154fda6b522e89c00df3dcb299
Author: Randall Hauch 
Date:   2017-08-22T23:29:52Z

KAFKA-4942 Fix commitTimeoutMs being set before the commit actually started

Backported the fix for this issue, which was fixed in 0.11.0.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3672: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-15 Thread rhauch
Github user rhauch closed the pull request at:

https://github.com/apache/kafka/pull/3672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3672: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-15 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3672

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets (0.11.0)

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

**This is for the `0.11.0` branch; see #3662 for the equivalent and 
already-approved PR for `trunk`.**

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5731-0.11.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3672


commit 526dbcb776effcc1661e51293a6d03256b19d0a6
Author: Randall Hauch 
Date:   2017-08-12T00:42:06Z

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit 80965cb5f8771e63b0dad095287cb9a29dea47f6
Author: Randall Hauch 
Date:   2017-08-14T19:11:08Z

KAFKA-5731 Corrected mock consumer behavior during rebalance

Corrects the test case added in the previous commit to properly revoke the 
existing partition assignments before adding new partition assigments.

commit 37687c544513566c7d728273137a12751702ad41
Author: Randall Hauch 
Date:   2017-08-14T19:11:45Z

KAFKA-5731 Added expected call that was missing in another test

commit bfac0688ab64935bc4ac9c11e0a6251ca03e1043
Author: Randall Hauch 
Date:   2017-08-14T22:24:35Z

KAFKA-5731 Improved log messages related to offset commits

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit bbc316dfa4353a7e914d11ceaca2b60c1bdaf291
Author: Randall Hauch 
Date:   2017-08-15T14:47:05Z

KAFKA-5731 More cleanup of log messages related to offset commits

commit 00b17ebbb5effb7f8aa171ea69b0227c7b009e97
Author: Randall Hauch 
Date:   2017-08-15T16:21:52Z

KAFKA-5731 More improvements to the log messages in WorkerSinkTask

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit 3a5c31f1b912e06aded4b74524c73dfe1033e76c
Author: Randall Hauch 
Date:   2017-08-15T16:31:28Z

KAFKA-5731 Removed unnecessary log message

commit 8b91f93e8c4c7b6b8e1aa6721f86ff01f8ecf40e
Author: Randall Hauch 
Date:   2017-08-15T17:54:16Z

KAFKA-5731 Additional tweaks to debug and trace log messages to ensure 
clarity and usefulness

# Conflicts:
#   
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java

commit bea03f69055524e9005302200f3a560a9cad2c3f
Author: Randall Hauch 
Date:   2017-08-15T19:30:09Z

KAFKA-5731 Use the correct value in trace messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3662: KAFKA-5731 Corrected how the sink task worker upda...

2017-08-13 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3662

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5731

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3662.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3662


commit be34cf05fe6956a24b77b35a689bda7b55e7de12
Author: Randall Hauch 
Date:   2017-08-12T00:42:06Z

KAFKA-5731 Corrected how the sink task worker updates the last committed 
offsets

Prior to this change, it was possible for the synchronous consumer commit 
request to be handled before previously-submitted asynchronous commit requests. 
If that happened, the out-of-order handlers improperly set the last committed 
offsets, which then became inconsistent with the offsets the connector task is 
working with.

This change ensures that the last committed offsets are updated only for 
the most recent commit request, even if the consumer reorders the calls to the 
callbacks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3641: KAFKA-5704 Corrected Connect distributed startup b...

2017-08-07 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3641

KAFKA-5704 Corrected Connect distributed startup behavior to allow older 
brokers to auto-create topics

When a Connect distributed worker starts up talking with broker versions 
0.10.1.0 and later, it will use the AdminClient to look for the internal topics 
and attempt to create them if they are missing. Although the AdminClient was 
added in 0.11.0.0, the AdminClient uses APIs to create topics that existed in 
0.10.1.0 and later. This feature works as expected when Connect uses a broker 
version 0.10.1.0 or later.

However, when a Connect distributed worker starts up using a broker older 
than 0.10.1.0, the AdminClient is not able to find the required APIs and thus 
will throw an UnsupportedVersionException. Unfortunately, this exception is not 
caught and instead causes the Connect worker to fail even when the topics 
already exist.

This change handles the UnsupportedVersionException by logging a debug 
message and doing nothing. The existing producer logic will get information 
about the topics, which will cause the broker to create them if they don’t 
exist and broker auto-creation of topics is enabled. This is the same behavior 
that existed prior to 0.11.0.0, and so this change restores that behavior for 
brokers older than 0.10.1.0.

This change also adds a system test that verifies Connect works with a 
variety of brokers and is able to run source and sink connectors. The test 
verifies that Connect can read from the internal topics when the connectors are 
restarted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5704

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3641


commit 0d45fd113eeaff3844742181191db2cc508353fd
Author: Randall Hauch 
Date:   2017-08-07T19:32:29Z

KAFKA-5704 Corrected Connect distributed startup behavior to allow older 
brokers to auto-create topics

When a Connect distributed worker starts up talking with broker versions 
0.10.1.0 and later, it will use the AdminClient to look for the internal topics 
and attempt to create them if they are missing. Although the AdminClient was 
added in 0.11.0.0, the AdminClient uses APIs to create topics that existed in 
0.10.1.0 and later. This feature works as expected when Connect uses a broker 
version 0.10.1.0 or later.

However, when a Connect distributed worker starts up using a broker older 
than 0.10.1.0, the AdminClient is not able to find the required APIs and thus 
will throw an UnsupportedVersionException. Unfortunately, this exception is not 
caught and instead causes the Connect worker to fail even when the topics 
already exist.

This change handles the UnsupportedVersionException by logging a debug 
message and doing nothing. The existing producer logic will get information 
about the topics, which will cause the broker to create them if they don’t 
exist and broker auto-creation of topics is enabled. This is the same behavior 
that existed prior to 0.11.0.0, and so this change restores that behavior for 
brokers older than 0.10.1.0.

This change also adds a system test that verifies Connect works with a 
variety of brokers and is able to run source and sink connectors. The test 
verifies that Connect can read from the internal topics when the connectors are 
restarted.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3563: Added safe deserialization impl

2017-07-21 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3563

Added safe deserialization impl



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka deserialization-validation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3563


commit fcd4d5f069aa084a6bab9dea4f81de0d63bcfbe8
Author: Hooman Broujerdi 
Date:   2017-07-20T02:50:17Z

Added safe deserialization Impl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3379: KAFKA-5472 Eliminated duplicate group names when v...

2017-06-19 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3379

KAFKA-5472 Eliminated duplicate group names when validating connector 
results

Kafka Connect was adding duplicate group names in the response from the 
REST API's validation of connector configurations. This fixes the duplicates 
and maintains the order of the `ConfigDef` objects so that the `ConfigValue` 
results are in the same order.

This is a blocker and should be merged to 0.11.0.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka KAFKA-5472

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3379.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3379


commit d8e6168584cab68ff0844b6d18d8cdb79792427b
Author: Randall Hauch 
Date:   2017-06-20T00:52:28Z

KAFKA-5472 Eliminated duplicate group names when validating connector 
results




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3344: KAFKA-5450 Increased timeout of Connect system tes...

2017-06-14 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3344

KAFKA-5450 Increased timeout of Connect system test utilities

Increased the timeout from 30sec to 60sec. When running the system tests 
with packaged Kafka, Connect workers can take about 30seconds to start.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka KAFKA-5450

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3344


commit 0f1f9f98b175e06373f1f4e5edcb009903b22e6e
Author: Randall Hauch 
Date:   2017-06-14T21:41:11Z

KAFKA-5450 Increased timeout of Connect system test utilities
Increased the timeout from 30sec to 60sec. When running the system tests 
with packaged Kafka, Connect workers can take about 30seconds to start.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3198: KAFKA-5164 Ensure SetSchemaMetadata updates key or...

2017-06-01 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/3198

KAFKA-5164 Ensure SetSchemaMetadata updates key or value when Schema changes

When the `SetSchemaMetadata` SMT is used to change the name and/or version 
of the key or value’s schema, any references to the old schema in the key or 
value must be changed to reference the new schema. Only keys or values that are 
`Struct` have such references, and so currently only these are adjusted.

This is based on `trunk` since the fix is expected to be targeted to the 
0.11.1 release.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-5164

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3198


commit 68bf68b3df065437318e2a87a6d1181b1205a806
Author: Randall Hauch 
Date:   2017-06-01T20:40:01Z

KAFKA-5164 Ensure SetSchemaMetadata updates key or value when Schema changes

When the `SetSchemaMetadata` SMT is used to change the name and/or version 
of the key or value’s schema, any references to the old schema in the key or 
value must be changed to reference the new schema. Only keys or values that are 
`Struct` have such references, and so currently only these are adjusted.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2984: KAFKA-4667 Connect uses AdminClient to create inte...

2017-05-05 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/2984

KAFKA-4667 Connect uses AdminClient to create internal topics when needed

The backing store for offsets, status, and configs now attempts to use the 
new AdminClient to look up the internal topics and create them if they don’t 
yet exist. If the necessary APIs are not available in the connected broker, the 
stores fall back to the old behavior of relying upon auto-created topics. Kafka 
Connect requires a minimum of Apache Kafka 0.10.0.1-cp1, and the AdminClient 
can work with all versions since 0.10.0.0.

All three of Connect’s internal topics are created as compacted topics, 
and new distributed worker configuration properties control the replication 
factor for all three topics and the number of partitions for the offsets and 
status topics; the config topic requires a single partition and does not allow 
it to be set via configuration. All of these new configuration properties have 
sensible defaults, meaning users can upgrade without having to change any of 
the existing configurations. In most situations, existing Connect deployments 
will have already created the storage topics prior to upgrading.

The replication factor defaults to 3, so anyone running Kafka clusters with 
fewer nodes than 3 will receive an error unless they explicitly set the 
replication factor for the three internal topics. This is actually desired 
behavior, since it signals the users that they should be aware they are not 
using sufficient replication for production use.

The integration tests use a cluster with a single broker, so they were 
changed to explicitly specify a replication factor of 1 and a single partition.

The `KafkaAdminClientTest` was refactored to extract a utility for setting 
up a `KafkaAdminClient` with a `MockClient` for unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-4667

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2984.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2984


commit ea181ba29a678e47a5ff88ca0ad4c995aa4e017b
Author: Randall Hauch 
Date:   2017-05-05T17:14:07Z

KAFKA-4667 Connect can create internal topics using AdminClient

The backing store for offsets, status, and configs now attempts to use the 
new AdminClient to look up the
internal topics and create them if they don’t yet exist. If the necessary 
APIs are not available in the connected
broker, the stores fall back to the old behavior of relying upon 
auto-created topics. Kafka Connect requires
a minimum of Apache Kafka 0.10.0.1-cp1, and the AdminClient can work with 
all versions since 0.10.0.0.

All three of Connect’s internal topics are created as compacted topics, 
and new distributed worker configuration
properties control the replication factor for all three topics and the 
number of partitions for the offsets
and status topics; the config topic requires a single partition and does 
not allow it to be set via configuration.
All of these new configuration properties have sensible defaults, meaning 
users can upgrade without having to
change any of the existing configurations. In most situations, existing 
Connect deployments will have already
created the storage topics prior to upgrading.

The replication factor defaults to 3, so anyone running Kafka clusters with 
fewer nodes than 3 will receive an
error unless they explicitly set the replication factor for the three 
internal topics. This is actually desired behavior,
since it signals the users that they should be aware they are not using 
sufficient replication for production use.

The integration tests use a cluster with a single broker, so they were 
changed to explictly specify a replication factor of 1
and a single partition.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1871: KAFKA-4183 Corrected Kafka Connect's JSON Converte...

2016-09-16 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/1871

KAFKA-4183 Corrected Kafka Connect's JSON Converter to properly convert 
from null to logical values

The `JsonConverter` class has `LogicalTypeConverter` implementations for 
Date, Time, Timestamp, and Decimal, but these implementations fail when the 
input literal value (deserialized from the message) is null. 

Test cases were added to check for these cases, and these failed before the 
`LogicalTypeConverter` implementations were fixed to consider whether the 
schema has a default value or is optional, similarly to how the 
`JsonToConnectTypeConverter` implementations do this. Once the fixes were made, 
the new tests pass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-4183-0.10.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1871


commit 4ffb9409f5a75345cf53aed0d799e6c694f636ca
Author: Randall Hauch 
Date:   2016-09-16T19:05:06Z

KAFKA-4183 Corrected Kafka Connect's JSON Converter to properly convert 
from deserialized null values.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1867: KAFKA-4183 Corrected Kafka Connect's JSON Converte...

2016-09-16 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/1867

KAFKA-4183 Corrected Kafka Connect's JSON Converter to properly convert 
from null to logical values

The `JsonConverter` class has `LogicalTypeConverter` implementations for 
Date, Time, Timestamp, and Decimal, but these implementations fail when the 
input literal value (deserialized from the message) is null. 

Test cases were added to check for these cases, and these failed before the 
`LogicalTypeConverter` implementations were fixed to consider whether the 
schema has a default value or is optional, similarly to how the 
`JsonToConnectTypeConverter` implementations do this. Once the fixes were made, 
the new tests pass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-4183

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1867


commit c21db21e7b56f7c8ea31fab9852a6852dc038015
Author: Randall Hauch 
Date:   2016-09-16T19:05:06Z

KAFKA-4183 Corrected Kafka Connect's JSON Converter to properly convert 
from deserialized null values.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2649 Add support for custom partitioning...

2015-10-14 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/309

KAFKA-2649 Add support for custom partitioning in topology sinks

Added option to use custom partitioning logic within each topology sink.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-2649

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #309


commit 4a5a2adf6667bcaeccc7b49d63923b1b22e19d16
Author: Randall Hauch 
Date:   2015-10-14T15:19:43Z

KAFKA-2649 Add support for custom partitioning in topology sinks

Added option to use custom partitioning logic within each topology sink.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2600 Align Kafka Streams' interfaces wit...

2015-10-01 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/270

KAFKA-2600 Align Kafka Streams' interfaces with Java 8 functional interfaces

A few of Kafka Stream's interfaces and classes are not as well-aligned with 
Java 8's functional interfaces. By making these changes, when Kafka moves to 
Java 8 these classes can extend standard Java 8 functional interfaces while 
remaining backward compatible. This will make it easier for developers to use 
Kafka Streams, and may allow us to eventually remove these custom interfaces 
and just use the standard Java 8 interfaces.

The changes include:

1. The 'apply' method of KStream's `Predicate` functional interface was 
renamed to `test` to match the method name on `java.util.function.BiPredicate`. 
This will allow KStream's `Predicate` to extend `BiPredicate` when Kafka moves 
to Java 8, and for the `KStream.filter` and `filterOut` methods to accept 
`BiPredicate`.
2. Renamed the `ProcessorDef` and `WindowDef` interfaces to 
`ProcessorSupplier` and `WindowSupplier`, respectively. Also the 
`SlidingWindowDef` class was renamed to `SlidingWindowSupplier`, and the 
`MockProcessorDef` test class was renamed to `MockProcessorSupplier`. The 
`instance()` method in all were renamed to `get()`, so that all of these can 
extend/implement Java 8's `java.util.function.Supplier` interface in the 
future with no other changes and while remaining backward compatible. Variable 
names that used some form of "def" were changed to use "supplier".

These two sets of changes were made in separate commits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-2600

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #270


commit 0e88dce3e03ec6b8821f256490467d7baa8bf4b4
Author: Randall Hauch 
Date:   2015-10-01T16:07:51Z

KAFKA-2600 Renamed the method of KStream's Predicate to match Java 8's 
BiPredicate

The 'apply' method of KStream's Predicate functional interface was renamed 
to 'test' to match the method name on java.util.function.BiPredicate. This will 
allow KStream's Predicate to extend BiPredicate when Kafka moves to Java 8, and 
for the KStream.filter and filterOut methods to accept BiPredicate, making it a 
bit easier for developers to use KStream.

commit 07dc4571b5aac28ac63948a5626d65fd2a40eb82
Author: Randall Hauch 
Date:   2015-10-01T16:25:12Z

KAFKA-2600 Renamed the *Def interfaces to *Supplier

To better align with Java 8's Supplier interface, the ProcessorDef, 
WindowDef, and SlidingWindowDef interfaces/classes were renamed to 
ProcessorSupplier, WindowSupplier, and SlidingWindowSupplier, and their 
'instance' methods were renamed to 'get'.

When Kafka moves to Java 8, these interfaces/classes can extend/implement 
Java 8's Supplier, and these interfaces can be eventually removed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2597: Add to .gitignore the Eclipse IDE ...

2015-09-28 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/259

KAFKA-2597: Add to .gitignore the Eclipse IDE directories

The ` .metadata` and `.recommenders` keep IDE workspace state and should 
not be committed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-2597

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #259


commit ce8128be8d7f07fcca981b86087b6517f15a23af
Author: Randall Hauch 
Date:   2015-09-29T01:10:56Z

KAFKA-2597: Add to .gitignore the Eclipse IDE directories .metadata and 
.recommenders




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2594 Added InMemoryLRUCacheStore

2015-09-28 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/256

KAFKA-2594 Added InMemoryLRUCacheStore

Added a new `KeyValueStore` implementation called `InMemoryLRUCacheStore` 
that keeps a maximum number of entries in-memory, and as the size exceeds the 
capacity the least-recently used entry is removed from the store and the 
backing topic. Also added unit tests for this new store and the existing 
`InMemoryKeyValueStore` and `RocksDBKeyValueStore` implementations. A new 
`KeyValueStoreTestDriver` class simplifies all of the other tests, and can be 
used by other libraries to help test their own custom implementations.

This PR depends upon 
[KAFKA-2593](https://issues.apache.org/jira/browse/KAFKA-2593) and its PR at 
https://github.com/apache/kafka/pull/255. Once that PR is merged, I can rebase 
this PR if desired.

Two issues were uncovered when creating these new unit tests, and both are 
also addressed as separate (small) commits in this PR:
* The `RocksDBKeyValueStore` initialization was not creating the file 
system directory if missing.
* `MeteredKeyValueStore` was casting to `ProcessorContextImpl` to access 
the `RecordCollector`, which prevent using `MeteredKeyValueStore` 
implementations in tests where something other than `ProcessorContextImpl` was 
used. The fix was to introduce a `RecordCollector.Supplier` interface to define 
this `recordCollector()` method, and change `ProcessorContextImpl` and 
`MockProcessorContext` to both implement this interface. Now, 
`MeteredKeyValueStore` can cast to the new interface to access the record 
collector rather than to a single concrete implementation, making it possible 
to use any and all current stores inside unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-2594

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #256


commit d7e25f55dcbac1b64cfb0a8f642ce55078bd7d03
Author: Randall Hauch 
Date:   2015-09-28T18:34:51Z

KAFKA-2593 Key value stores can use custom serializers and deserializers

Add support for the key value stores to use specified serializers and 
deserializers (aka, "serdes"). Prior to this change, the stores were limited to 
only the default serdes specified in the topology's configuration and exposed 
to the processors via the ProcessorContext.

Now, using InMemoryKeyValueStore and RocksDBKeyValueStore are similar: both 
are parameterized on the key and value types, and both have similar multiple 
static factory methods. The static factory methods either take explicit key and 
value serdes, take key and value class types so the serdes can be inferred 
(only for the built-in serdes for string, integer, long, and byte array types), 
or use the default serdes on the ProcessorContext.

commit 3b903d31503fcce7cf2b9607b68893a02425dacf
Author: Randall Hauch 
Date:   2015-09-28T19:11:55Z

KAFKA-2594: Corrected RocksDBKeyValueStore to properly create directory if 
missing

commit 33e4fd2b8bb56783d62f55e223757e8c479212ed
Author: Randall Hauch 
Date:   2015-09-28T19:15:24Z

KAFKA-2594: MeteredKeyValueStore no longer casts to ProcessorContextImpl

The cast was used to access ProcessorContextImpl.recordCollector(), and the 
cast prevent using MeteredKeyValueStore implementations in tests where 
something other than ProcessorContextImpl was used. Introduced a 
RecordCollector.Supplier interface to define this `recordCollector()` method, 
and changed ProcessorContextImpl and MockProcessorContext to both implement 
this interface. Now, MeteredKeyValueStore can cast to the interface to access 
the record collector rather than to a single concrete implementation, making it 
possible to use the stores inside unit tests.

commit ff5e779fa1d6955bf0bf633191edd9d7a7cb9a9c
Author: Randall Hauch 
Date:   2015-09-28T19:26:16Z

KAFKA-2594 Added InMemoryLRUCacheStore

Added a new KeyValueStore implementation that keeps a maximum number of 
entries in-memory, and as the size exceeds the capacity the least-recently used 
entry is removed from the store and the backing topic. Also added unit tests 
for this new store and the existing InMemoryKeyValueStore and 
RocksDBKeyValueStore implementations. A new KeyValueStoreTestDriver class 
simplifies all of the other tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2593 Key value stores can use specified ...

2015-09-28 Thread rhauch
GitHub user rhauch opened a pull request:

https://github.com/apache/kafka/pull/255

KAFKA-2593 Key value stores can use specified serializers and deserializers

Add support for the key value stores to use specified serializers and 
deserializers (aka, "serdes"). Prior to this change, the stores were limited to 
only the default serdes specified in the topology's configuration and exposed 
to the processors via the ProcessorContext.

Now, using InMemoryKeyValueStore and RocksDBKeyValueStore are similar: both 
are parameterized on the key and value types, and both have similar multiple 
static factory methods. The static factory methods either take explicit key and 
value serdes, take key and value class types so the serdes can be inferred 
(only for the built-in serdes for string, integer, long, and byte array types), 
or use the default serdes on the ProcessorContext.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rhauch/kafka kafka-2593

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #255


commit 4e3c1a883ff7470ced79ed747a4fe4893630c220
Author: Randall Hauch 
Date:   2015-09-28T18:34:51Z

KAFKA-2593 Key value stores can use custom serializers and deserializers

Add support for the key value stores to use specified serializers and 
deserializers (aka, "serdes"). Prior to this change, the stores were limited to 
only the default serdes specified in the topology's configuration and exposed 
to the processors via the ProcessorContext.

Now, using InMemoryKeyValueStore and RocksDBKeyValueStore are similar: both 
are parameterized on the key and value types, and both have similar multiple 
static factory methods. The static factory methods either take explicit key and 
value serdes, take key and value class types so the serdes can be inferred 
(only for the built-in serdes for string, integer, long, and byte array types), 
or use the default serdes on the ProcessorContext.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---