[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002004#comment-16002004 ] ASF GitHub Bot commented on BEAM-1723: -- Github user aljoscha closed the pull request at: https://github.com/apache/beam/pull/2959 > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Aljoscha Krettek > Fix For: 2.0.0 > > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001431#comment-16001431 ] Aljoscha Krettek commented on BEAM-1723: That sound good! I opened a PR: https://github.com/apache/beam/pull/2959 > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Aljoscha Krettek > Fix For: 2.0.0 > > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001428#comment-16001428 ] ASF GitHub Bot commented on BEAM-1723: -- GitHub user aljoscha opened a pull request: https://github.com/apache/beam/pull/2959 [BEAM-1723] deduplication of UnboundedSource in Flink runner You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/beam cherry-pick-1723 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2959.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2959 > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Aljoscha Krettek > Fix For: 2.0.0 > > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001239#comment-16001239 ] ASF GitHub Bot commented on BEAM-1723: -- Github user asfgit closed the pull request at: https://github.com/apache/beam/pull/2476 > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Jingsong Lee > Fix For: 2.0.0 > > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962136#comment-15962136 ] ASF GitHub Bot commented on BEAM-1723: -- GitHub user JingsongLi opened a pull request: https://github.com/apache/beam/pull/2476 [BEAM-1723] deduplication of UnboundedSource in Flink runner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/JingsongLi/beam BEAM-1723 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2476.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2476 commit 2dc6ea51bb60f8f4beb397bc7179e1829be30d77 Author: JingsongLi Date: 2017-04-09T13:16:50Z [BEAM-1723] deduplication of UnboundedSource in Flink runner > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Jingsong Lee > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962125#comment-15962125 ] Jingsong Lee commented on BEAM-1723: I think it is necessary to be configurable because the deduplication window is related to the checkpoint interval. > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Jingsong Lee > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962077#comment-15962077 ] Jingsong Lee commented on BEAM-1723: I understand. The reason for the duplication is that {{PubSubIO}} use Pull-Ack model, {{acknowledge()}} in {{finalizeCheckpoint()}} may be fail, while Kafka use offset to restore. > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Jingsong Lee > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959390#comment-15959390 ] Kenneth Knowles commented on BEAM-1723: --- The caches do need to be fault-tolerant or you'll get dupes. It is simplest to have no configuration, but hard to say. I think there could be some discussion here. The deduplication window is really about the potential for re-delivery of a message, not like allowed lateness at all. For example, in {{PubsubIO}} duplicates occur when output is committed but {{finalizeCheckpoint}} does not succeed at ACKing all messages. Then Pubsub will redeliver the message. > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh >Assignee: Jingsong Lee > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping
[ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959226#comment-15959226 ] Jingsong Lee commented on BEAM-1723: I see {{CachedIdDeduplicator}} in direct runner. It use {{LoadingCache}} to dedup. The expireAfterAccess is 10 minutes and the maximumSize is 100_000. Do these two values need to be parameterized? Do these caches need be snapshotted in flink runner? (Fault tolerance) > FlinkRunner should deduplicate when an UnboundedSource requires Deduping > > > Key: BEAM-1723 > URL: https://issues.apache.org/jira/browse/BEAM-1723 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Thomas Groh > > UnboundedSource implementations can require deduping, and the FlinkRunner > currently logs a warning that this is not supported. > https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139 -- This message was sent by Atlassian JIRA (v6.3.15#6346)