[
https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698871#comment-14698871
]
ASF GitHub Bot commented on FLINK-2386:
---------------------------------------
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/1028
[FLINK-2386] Rework Kafka consumer for Flink
This is a reworked and extended version of #996 . It also build on top of
#1017
It improves the Kafka consumer, fixes bugs, and offers pluggable *fetcher*
and *offset committer* to make it work across Kafka versions from 0.8.1 to
0.8.3 (upcoming).
## Functionality
- The Kafka consumer properly preserves partitioning across
failures/restart.
- Pluggable fetchers / committers for multiple Kafka versions
interoperability
- Fetcher based on the low level consumer API
- Fetcher based on the upcoming new consumer API (backported and
included in the Flink Kafka consumer).
- Proper cancelability
- The test coverage is vastly improved.
## Tests
This pull request includes a set of new thorough test for the Kafka consumer
- Preserving of partitioning and exactly once behavior for
- 1:1 source to kafka partition mapping
- 1:n source to kafka partition mapping
- n:1 source to kafka partition mapping
- Broker failure
- Cancelling
- After immediate failures during deployment
- While waiting to read from a partition
- While currently reading from a partition
- Commit notifications for checkpoints
- Large records (30 MB)
- Alignment of offsets with what is committed into ZooKeeper
- Concurrent produce/consumer programs
## Limitations
The code based on the low-level consumer seems to work well.
The high-level consumer does not work with very large records It looks like
a problem in the backported Kafka code, but it is not 100% sure.
## Debug Code
This pull request includes some debug code in the `BarrierBuffer` that I
intend to remove. It is there to track possible cornercase problems in the
checkpoint barrier handling.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink kafka
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1028.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1028
----
commit beed1d499b8c876330fac73da324235dd69a4e91
Author: Stephan Ewen <[email protected]>
Date: 2015-08-14T21:32:35Z
[FLINK-2462] [streaming] Major cleanup of operator structure for exception
handling and code simplication
- The exceptions are no longer logged by the operators themselves.
Operators perform only cleanup in reaction to exceptions.
Exceptions are reported only the the root Task object, which knows
whether this is the first
failure-causing exception (root cause), or is a subsequent exception,
or whether the task was
actually canceled already. In the later case, exceptions are ignored,
because many
cancellations lead to meaningless exceptions.
- more exception in signatures, less wrapping where not needed
- Core resource acquisition/release logic is in one streaming task,
reducing code duplication
- Guaranteed cleanup of output buffer and input buffer resources
(formerly missed when other exceptions where encountered)
- Fix mixup in instantiation of source contexts in the stream source task
- Auto watermark generators correctly shut down their interval scheduler
- Improve use of generics, got rid of many raw types
This closes #1017
commit 57caed89279253ba3be7d69144cb762aac98f1f5
Author: Stephan Ewen <[email protected]>
Date: 2015-08-16T14:52:16Z
[tests] Reinforce StateCheckpoinedITCase to make sure actual checkpointing
has happened before a failure.
commit 2568d8dfa97e8a33ce63b39e2b39a01584781568
Author: Robert Metzger <[email protected]>
Date: 2015-07-20T19:39:46Z
[FLINK-2386] [kafka connector] Add new Kafka Consumer for Flink
This closes #996
commit 1a420413381a5d660742e7da576cb3cae5f0a613
Author: Stephan Ewen <[email protected]>
Date: 2015-08-11T12:21:33Z
[streaming] Cleanup de-/serialization schema, add
TypeInformationSerializationSchema prominent, add tests.
commit 32674145503ce36ab16bf7641892fdbe4f4a1045
Author: Stephan Ewen <[email protected]>
Date: 2015-08-11T14:48:26Z
[FLINK-2386] [kafka connector] Add comments to all backported kafka sources
and move them to 'org.apache.flink.kafka_backport'
commit 4d4cb1cf79dc656261284bc3e044c27898a4152c
Author: Stephan Ewen <[email protected]>
Date: 2015-08-11T20:21:53Z
[FLINK-2386] [kafka connector] Refactor, cleanup, and fix kafka consumers
----
> Implement Kafka connector using the new Kafka Consumer API
> ----------------------------------------------------------
>
> Key: FLINK-2386
> URL: https://issues.apache.org/jira/browse/FLINK-2386
> Project: Flink
> Issue Type: Improvement
> Components: Kafka Connector
> Reporter: Robert Metzger
> Assignee: Robert Metzger
>
> Once Kafka has released its new consumer API, we should provide a connector
> for that version.
> The release will probably be called 0.9 or 0.8.3.
> The connector will be mostly compatible with Kafka 0.8.2.x, except for
> committing offsets to the broker (the new connector expects a coordinator to
> be available on Kafka). To work around that, we can provide a configuration
> option to commit offsets to zookeeper (managed by flink code).
> For 0.9/0.8.3 it will be fully compatible.
> It will not be compatible with 0.8.1 because of mismatching Kafka messages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)