[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-04 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352004#comment-16352004
 ] 

Jungtaek Lim commented on STORM-2914:
-

Thanks [~Srdo], merged into master.

Since there's some diverge between master and 1.x-branch for 
storm-kafka-client, I'd like to wait for [~Srdo] to submit pull request against 
1.x branch. We can merge it afterwards.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
> Attachments: storm-kafka-modes.ods
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-04 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352003#comment-16352003
 ] 

Jungtaek Lim commented on STORM-2914:
-

Thanks [~Srdo], merged into master.

Since there's some diverge between master and 1.x-branch for 
storm-kafka-client, I'd like to wait for [~Srdo] to submit pull request against 
1.x branch. We can merge it afterwards.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
> Attachments: storm-kafka-modes.ods
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-03 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351584#comment-16351584
 ] 

Hugo Louro commented on STORM-2914:
---

[~Srdo] Thanks for working on this over the weekend. I am going to start 
reviewing your patch now. We are in agreement on the best timing to create an 
interface to support pluggable offsets storage. I also don't thing we should do 
it now, i.e. for 1.2.0, hence why I hinted at 1.2.x, 1.3+.x and 2.x, basically 
some release after 1.2.0. Thanks.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
> Attachments: storm-kafka-modes.ods
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-03 Thread Alexandre Vermeerbergen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351308#comment-16351308
 ] 

Alexandre Vermeerbergen commented on STORM-2914:


Hello [~hmclouro]

+1 for your proposal to get things decided quickly !

In your 7 questions lists, my vote is:

+1 for item #5 : because I want to trust you guys on the meaning of "identical" 
in this sentence: "_Document that ProcessingGuarantee.NO_GUARANTEES has a 
behavior identical to auto.commit.enable=true but disregards this property 
altogether (if auto.commit.enable is set the KafkaSpoutConfig will throw an 
exception)_".

And I have no problem changing the way I configure Kafka spout to get this 
"identical" behavior if documentation is crystal clear about it. The 
documentation needs to be very clear about how to replace legacy use of 
auto.commit.enable and related setting by newer ones.

+ 0 for other items, meaning I am neutral there, as I'm selfishly concerned by 
my need to keep a behavior "identical" to auto.commit.enable=true.

I just hope that "identical" means "no noticeable difference in performance".

Best regards,

Alexandre Vermeerbergen

PS: I also hope that in next RC, the toollib/ directory will be cleaned up from 
*src* and *javadoc* artifacts, which seems to break the display of Kafka spout 
statistics in Nimbus UI in 1.2.0 RC2 (see my previous comment about this in 
this thread).**

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-02 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351147#comment-16351147
 ] 

Hugo Louro commented on STORM-2914:
---

[~Srdo][~kabhwan][~avermeerbergen] 

Let’s try to quickly agree on the specific action points that will help us move 
past any blockers and release Storm 1.2.0. I suggest the following, which will 
give us a good balance between making this release stable, but at the same time 
will give us the flexibility to accommodate changes that make sense in the 
future.

# (Optional) Rename ProcessingGuarantee.NONE to 
ProcessingGuarantee.NO_GUARANTEES or UNDEFINED (as Spark uses). I think that 
either of these names are clearer and emphasize that this option is likely not 
what the users wants
# Annotate the enum ProcessingGuarantee with *@Unstable* and document that 
ProcessingGuarantee.NO_GUARANTEES may be removed in the near future.
# For now we keep supporting NO_GUARANTEES mode, but there is a good chance 
that we will remove it down the line, unless for performance reasons. We will 
do a benchmark for this.
# NO_GUARANTEES will use *kafkaConsumer.commitAsync(offsetsToCommit, null);* as 
[~Srdo]
#  proposed in the PR.
# Document that ProcessingGuarantee.NO_GUARANTEES has a behavior identical to 
auto.commit.enable=true but disregards this property altogether (if 
auto.commit.enable is set the KafkaSpoutConfig will throw an exception). 
Furthermore, the commits will not be based on a timer as controlled by 
auto.commit.interval.ms, but rather using the following strategy ([~Srdo]
#  perhaps you can add the details in here).
# AT_MOST_ONCE will use *kafkaConsumer.commitSync(offsetsToCommit);* as [~Srdo]
#  proposed in the PR
# AT_LEAST_ONCE will stay as is (in the current code and in the PR) because 
this option requires no change.
# 
Slightly unrelated with this PR but to address [~kabhwan]’s comment on how to 
persist the offsets. The next version of the KafkaSpout (1.2.x and 1.3+.x) 
should have offsets persistence pluggable and abstracted behind an interface. 
The first implementation of this interface will commit offsets to Kafka as it 
is currently doing. In the future we can add support to write offsets to 
arbitrary durable storage.

If we agree on this, let's incorporate these changes in the PRs [~Srdo] created 
and target having them merged by end of Monday.


> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread Priyank Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349477#comment-16349477
 ] 

Priyank Shah commented on STORM-2914:
-

[~avermeerbergen] Not sure how the javadoc and sources jar got in to the binary 
distribution. On 1.1.1 binary distribution package i only see 

storm-kafka-monitor-1.1.1.jar

That should be okay and not create any issue in loading spout offset lags in 
storm ui. I will verify in the latest RCs as well. Can you double check as well 
to see if the other jars are present and if that is creating an issue in 
displaying the offset lags?

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349373#comment-16349373
 ] 

Jungtaek Lim commented on STORM-2914:
-

[~Srdo]

You've got my points. Thanks for reading my comment thoughtfully.

Regarding commit in ack(), we may need to take into consideration that unlike 
nextTuple() it is called only when spout receives ack. It should be easiest one 
but we don't want to commit every ack (acks may not come sequentially, and 
performance matters). But if we apply timer to ack(), we need to deal with edge 
case: not yet timed-out but received some acks afterwards which are "last" 
acks. Possible to deal with, but might be tricky. That's why I mentioned hybrid 
approach - allowing commit to Kafka in both nextTuple() and ack(), which still 
has such edge case as well but the chance is much smaller.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread Alexandre Vermeerbergen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349315#comment-16349315
 ] 

Alexandre Vermeerbergen commented on STORM-2914:


Hello [~Srdo]

Regarding storm-kafka-monitoring*.jar : thanks for your explanation, it was 
indeed already in our toollib/ directory from Storm 1.2.0RC2:

{{[root@ip-172-31-18-84 storm-stable]# ls toollib/}}
{{storm-kafka-monitor-1.2.0.jar  storm-kafka-monitor-1.2.0-javadoc.jar  
storm-kafka-monitor-1.2.0-sources.jar}}

But since I was no longer seeing Kafka spout stats in Nimbus UI's log, and it 
seem that this was broken because of some missing class issue (very strange 
that it's trying to load a class from *javadoc.jar" file, isn't it ?) :

{{org.apache.storm.utils.ShellUtils$ExitCodeException: Error: Could not find or 
load main class 
.usr.local.Storm.storm-stable.toollib.storm-kafka-monitor-1.2.0-javadoc.jar}}

{{    at org.apache.storm.utils.ShellUtils.runCommand(ShellUtils.java:231) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at org.apache.storm.utils.ShellUtils.run(ShellUtils.java:161) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.utils.ShellUtils$ShellCommandExecutor.execute(ShellUtils.java:371)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:461) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:444) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.utils.TopologySpoutLag.getLagResultForKafka(TopologySpoutLag.java:163)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.utils.TopologySpoutLag.getLagResultForNewKafkaSpout(TopologySpoutLag.java:189)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.utils.TopologySpoutLag.lag(TopologySpoutLag.java:57) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at org.apache.storm.ui.core$topology_lag.invoke(core.clj:805) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at org.apache.storm.ui.core$fn__9572.invoke(core.clj:1165) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.compojure.core$make_route$fn__5965.invoke(core.clj:100) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.compojure.core$if_route$fn__5953.invoke(core.clj:46) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.compojure.core$if_method$fn__5946.invoke(core.clj:31) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.compojure.core$routing$fn__5971.invoke(core.clj:113) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at clojure.core$some.invoke(core.clj:2570) ~[clojure-1.7.0.jar:?]}}
{{    at 
org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:113) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at clojure.lang.RestFn.applyTo(RestFn.java:139) 
~[clojure-1.7.0.jar:?]}}
{{    at clojure.core$apply.invoke(core.clj:632) ~[clojure-1.7.0.jar:?]}}
{{    at 
org.apache.storm.shade.compojure.core$routes$fn__5975.invoke(core.clj:118) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.cors$wrap_cors$fn__8880.invoke(cors.clj:149)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__8827.invoke(json.clj:56)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__6607.invoke(multipart_params.clj:118)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__7890.invoke(reload.clj:22)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.ui.helpers$requests_middleware$fn__6860.invoke(helpers.clj:52) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.ui.core$catch_errors$fn__9747.invoke(core.clj:1428) 
~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__6527.invoke(keyword_params.clj:35)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__6570.invoke(nested_params.clj:84)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.params$wrap_params$fn__6499.invoke(params.clj:64)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__6607.invoke(multipart_params.clj:118)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__6822.invoke(flash.clj:35)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.middleware.session$wrap_session$fn__6808.invoke(session.clj:98)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 
org.apache.storm.shade.ring.util.servlet$make_service_method$fn__6357.invoke(servlet.clj:127)
 ~[storm-core-1.2.0.jar:1.2.0]}}
{{    at 

[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349137#comment-16349137
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

[~avermeerbergen]

Yes, those logs are expected to occur, but only once per spout instance. Once 
the spout commits, the log should not be shown again.

[~kabhwan]

Sure, if we're moving to a stateful checkpointing model we might want to use 
that mechanism's storage to commit Kafka offsets as well. I don't think there's 
a reason to switch off of Kafka storage yet though.

If I'm understanding you correctly, you're saying that it's fine to restrict 
how users can configure the spout and consumer, because some configurations 
don't make sense for Storm (right?) If so, I agree and I think it's what we're 
trying to do with e.g. the code around enable.auto.commit already 
([https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L461),]
 or by not offering an option to use the subscribe API. This issue has largely 
the same kind of intent.

I'm hoping to have a bit of time to compare NONE and AT_MOST_ONCE this weekend. 
I'm wondering if [~avermeerbergen] has any interest in comparing these two 
settings as well, since you have a real workload?

Regarding the spout only committing when nextTuple is called, you have a point. 
I'm not sure why we couldn't move commit to happen during acks for the 
at-least-once case (the only one where acks matter). I don't have any insight 
into whether the case you describe would be likely to happen, but I'd think 
that it would often be possible to "luck into" a commit during the tuple 
sequence that leads up to triggering backpressure.

 

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread Priyank Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349071#comment-16349071
 ] 

Priyank Shah commented on STORM-2914:
-

[~avermeerbergen] That jar is for displaying the Kafka spout offset lag in 
Storm UI if topology has a Kafka Spout. The jar should already be present in 
the toollib directory of your installation and you should not need to include 
it in your topologies' jar. Can you confirm that?

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread Alexandre Vermeerbergen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348563#comment-16348563
 ] 

Alexandre Vermeerbergen commented on STORM-2914:


[~Srdo]

Also I noticed that in the artifacts built by Maven from your 1.x branch, 
there's this new JAR file:

storm-kafka-monitor-1.2.1-SNAPSHOT.jar

What is it meant for? Should I include it into my topologies "BigJars" ?

Worth mentioning: in Storm UI, the stats on Kafka consumption are no longer 
displayed.. is it related to this new JAR ?

Best regards,

Alexandre

 

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-02-01 Thread Alexandre Vermeerbergen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348558#comment-16348558
 ] 

Alexandre Vermeerbergen commented on STORM-2914:


Hello [~Srdo]

I have build storm-kafka-client.jar from your 1.x branch, tests are starting 
now.

While it's too early to say if it's OK or not, I noticed the following warning 
in one of our topologies's log:

 

2018-02-01 13:08:25.016 o.a.k.c.u.AppInfoParser 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Kafka version : 
0.10.2.1 2018-02-01 13:08:25.016 o.a.k.c.u.AppInfoParser 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Kafka commitId : 
e89bffd6b2eff799 2018-02-01 13:08:25.076 o.a.s.k.s.KafkaSpout 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Partitions revoked. 
[consumer-group=StormPodsyncTopology_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ,
 consumer=org.apache.kafka.clients.consumer.KafkaConsumer@5cb084d9, 
topic-partitions=[]] 2018-02-01 13:08:25.077 o.a.s.k.s.KafkaSpout 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Partitions 
reassignment. [task-ID=2, 
consumer-group=StormPodsyncTopology_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@5cb084d9, 
topic-partitions=[podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-15, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-4, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-3, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-6, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-5, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-0, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-2, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-1, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-12, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-11, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-14, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-13, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-8, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-7, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-10, 
podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-9]] 2018-02-01 13:08:25.083 
o.a.k.c.c.i.AbstractCoordinator Thread-4-podOrTenantsFromKafkaSpout-executor[2 
2] [INFO] Discovered coordinator 
ec2-34-242-207-227.eu-west-1.compute.amazonaws.com:9092 (id: 2147483644 rack: 
null) for group 
StormPodsyncTopology_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ. 2018-02-01 
13:08:25.116 o.a.s.k.s.i.CommitMetadata 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize 
[OffsetAndMetadata\{offset=63683, 
metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-15,
 offset=63682, numFails=0, 
thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely 
occurred because the last commit for this topic-partition was done using an 
earlier version of Storm. Defaulting to behavior compatible with earlier 
version 2018-02-01 13:08:25.125 o.a.s.k.s.i.CommitMetadata 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize 
[OffsetAndMetadata\{offset=67701, 
metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-4,
 offset=67700, numFails=0, 
thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely 
occurred because the last commit for this topic-partition was done using an 
earlier version of Storm. Defaulting to behavior compatible with earlier 
version 2018-02-01 13:08:25.130 o.a.s.k.s.i.CommitMetadata 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize 
[OffsetAndMetadata\{offset=69382, 
metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-3,
 offset=69381, numFails=0, 
thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely 
occurred because the last commit for this topic-partition was done using an 
earlier version of Storm. Defaulting to behavior compatible with earlier 
version 2018-02-01 13:08:25.134 o.a.s.k.s.i.CommitMetadata 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize 
[OffsetAndMetadata\{offset=78943, 
metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-6,
 offset=78942, numFails=0, 
thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely 
occurred because the last commit for this topic-partition was done using an 
earlier version of Storm. Defaulting to behavior compatible with earlier 
version 2018-02-01 13:08:25.135 o.a.s.k.s.i.CommitMetadata 
Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize 
[OffsetAndMetadata\{offset=76603, 
metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-5,
 offset=76602, numFails=0, 
thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely 
occurred because the last commit for this 

[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-31 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347797#comment-16347797
 ] 

Jungtaek Lim commented on STORM-2914:
-

Correction: If new model of backpressure doesn't adjust a rate but just block 
calling nextTuple(), when backpressure bubbles up to spout it may encounter 
same situation. New model doesn't always block spout in backpressure state, so 
the chance to encounter will become much less.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-31 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347675#comment-16347675
 ] 

Jungtaek Lim commented on STORM-2914:
-

[~hmclouro] [~Srdo]
As I showed the example of Flink and Spark and I believe others are doing it, 
we need to make data source being "adapted" to Storm's nature when implementing 
spout. We are making KafkaSpout being adapted to how Spout is working, and if 
we are going to support stateful exactly-once via checkpoint and need to make 
storm-kafka-client supporting this, we may also need to have a proper strategy 
like Spark and Flink, get rid of committing to kafka or at least not relying on 
that for checkpoint, since it is ideal to store checkpoint into "same" durable 
storage.

We need to take this into account while designing the spout and restrict some 
features if necessary, instead of letting users to use whatever they want to, 
but with the risks we don't guide whichever will be. If we really want to 
provide the feature which doesn't conform to Storm's nature, we should warn 
users, guiding the possible risks are best, but at least guiding that they 
should know what is exactly they're doing now.

That's why I'm not in favor of having NONE processing guarantee semantic only 
for storm-kafka-client, but if it really has enough value to take the risks (if 
there's huge gap between at-most-once and none in point of  performance view) 
we may need to provide the feature, with guiding the risks.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-30 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346235#comment-16346235
 ] 

Jungtaek Lim commented on STORM-2914:
-

Whatever we decide, it may be better to document such guide so that we could 
let users to be aware of any effects when enabling such option.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-30 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346232#comment-16346232
 ] 

Jungtaek Lim commented on STORM-2914:
-

[~hmclouro]
https://github.com/apache/flink/blob/1440e4febd589e320f846a2725e98aec8ee43e7f/docs/dev/connectors/kafka.md#kafka-consumers-offset-committing-behaviour-configuration

Flink clearly documents how "enable.auto.commit" option plays with Flink's 
checkpoint mechanism. Key statement is below:

{quote}
"Note that the Flink Kafka Consumer does not rely on the committed offsets for 
fault tolerance guarantees. The committed offsets are only a means to expose 
the consumer's progress for monitoring purposes."
{quote}

Based on the statement, Flink doesn't have any concern about committing to 
Kafka, which is very different to Storm since storm-kafka-client is completely 
relying on Kafka committed offset.

Here's Spark doc for streaming Kafka source:
https://github.com/apache/spark/blob/55b8cfe6e6a6759d65bf219ff570fd6154197ec4/docs/streaming-kafka-0-10-integration.md#storing-offsets

Relevant statements are below:

{quote}
Kafka has an offset commit API that stores offsets in a special Kafka topic. By 
default, the new consumer will periodically auto-commit offsets. This is almost 
certainly not what you want, because messages successfully polled by the 
consumer may not yet have resulted in a Spark output operation, resulting in 
undefined semantics. This is why the stream example above sets 
"enable.auto.commit" to false.
{quote}

Spark community also avoids to couple with "enable.auto.commit". Checkpoint 
would be done with different way, and even guide regarding storing offsets to 
kafka itself, they don't recommend using that option.

cc. [~Srdo] [~avermeerbergen]

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-30 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346215#comment-16346215
 ] 

Hugo Louro commented on STORM-2914:
---

[~kabhwan] I am OK with supporting only at-most-once or at-least-once. Do you 
know how other frameworks deal with Kafka auto.commit mode? If they don't allow 
it altogether then Storm would be in an equal playing field and performance 
would not matter much at that point in my opinion. It will simply become a 
feature that is not supported.

As far as the WARN level, initially when I created the PR with that change I 
had it with level DEBUG and I never felt that it should be WARN to begin with. 
During reviews someone asked for it to be WARN and I did it, but never felt it 
should be WARN, and I still don't think it should be.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-30 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346102#comment-16346102
 ] 

Hugo Louro commented on STORM-2914:
---

[~Srdo] [~kabhwan] [~avermeerbergen] the decision to keep or not keep NONE 
basically boils down to, do we want to support Kafka's enable.auto.commit or 
no? If we want to support this option, then the processing guarantee is 
technically NONE because a given record/tuple can be processed 0, 1, 2 or more 
times. In this case we cannot remove NONE.

That leads me to ask why are we removing the option to set the Kafka property 
enable.auto.commit and throwing an exception. This is a Kafka feature that is 
available to every Kafka consumer, so why should the KafkaSpout, which is 
technically a Kafka consumer, be any different?
 
If the goal is to avoid the WARN exception that was getting printed when 
enable.auto.commit=true, the obvious thing to do in my opinion is to simply not 
log the message if the processing guarantee is not AT_LEAST_ONCE.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-30 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345533#comment-16345533
 ] 

Hugo Louro commented on STORM-2914:
---

[~Srdo] I am reviewing this patches today. I have going over the discussion 
since yesterday to put all the pieces together.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345522#comment-16345522
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

[~avermeerbergen] I have a 1.x branch here 
https://github.com/srdo/storm/tree/STORM-2914-2913-1.x with both 2914 and 2913 
fixed. I'd appreciate if you would build and try out this fix.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-28 Thread Alexandre Vermeerbergen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342987#comment-16342987
 ] 

Alexandre Vermeerbergen commented on STORM-2914:


[~Srdo]

We do set topology.max.spout.pending, and we already had a discussion in the 
past that until Storm 2.x, we had to use our by-pass to overcome the 
limitations of Storm 1.x back-pressure:

See:  
[https://mail-archives.apache.org/mod_mbox/storm-user/201705.mbox/%3CCADeKz6qZG12mN-=gf+mpta1jwxdk8_wwz1npbcgslx7fga4...@mail.gmail.com%3E]

(at subsequent posts). We observe no OOM, just the consumption from Spouts 
stops, regardless of whether it our own Kafka spout, the old Storm Kafka client 
or the newer Storm Kafka Client.

Back to Storm 1.2.0 : any idea when I could test a fix which removes the 
cycling "WARN" message about metadata ?

Thanks!

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342365#comment-16342365
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

{quote}
My belief is that our alerting topology gets stuck when there's a huge peak of 
messages rate because somehow we saturate Storm by exceeding its capacity, 
and/or maybe we hit some race condition that hands the spouts in a way that 
"normal load" would never trigger.
{quote}

I'd also add that what you're describing here is something you can likely avoid 
by configuring your topologies differently. For example, you can set 
topology.max.spout.pending to limit how many pending tuples will be allowed 
into the topology at a time, so a spike in messages written to Kafka doesn't 
trigger a flood of messages in Storm. Even without this configuration, the 
topology shouldn't hang (more likely an OOME would occur). I'd encourage you to 
try to get some more information (e.g. Storm UI screenshots, thread dumps of 
the workers) if it happens again, and raise a new issue here so we can try to 
figure out what's happening.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342327#comment-16342327
 ] 

Jungtaek Lim commented on STORM-2914:
-

[~Srdo] 
Got it. Thanks for explanation. So it could make much difference if committing 
offset is costly.
Basically I'm against keeping NONE, since not only it adds more complexity, but 
also it adds the issues to investigate (that guarantee is not we've supported 
before in other spouts), but we may still want to support it if the cost of 
committing is considerable.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342319#comment-16342319
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

[~kabhwan] It would be good to figure out why Alexandre was seeing that 
behavior. Just to be clear though, this issue isn't related to fixing that. I 
simply wanted to know whether people would object to losing the NONE guarantee.

To be a little more concrete about the difference between autocommit and how 
the spout acts with AT_MOST_ONCE, when the consumer is configured to 
autocommit, it will set an internal timestamp. If we call poll when the 
timestamp is sufficiently old, the consumer will asynchronously commit the 
offsets it has returned in earlier poll calls by effectively calling 
commitAsync on the consumer. When the spout is set to use AT_MOST_ONCE, the 
spout immediately calls commitSync after calling poll.

I don't believe there's any difference between what goes over the network in 
the two calls, the only differences should be that one call is blocking and one 
is not, and that the AT_MOST_ONCE setting commits after every poll, while 
autocommit commits only on polls where the timer has expired.

The linked PR removes the option to use autocommit, and makes the spout emulate 
autocommit's behavior when using NONE. After writing the code I don't think 
it's a big deal if we want to keep NONE, I thought it would add more complexity.

 

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342316#comment-16342316
 ] 

Jungtaek Lim commented on STORM-2914:
-

FYI to [~avermeerbergen], the behavior of back pressure will be completely 
renewed in Storm 2.0.0.
https://issues.apache.org/jira/browse/STORM-2306
Please refer the doc in the issue. I guess it makes more sense to you, but 
please comment to the issue if you find some concerns. The change is included 
in huge patch, so there's no chance to port back to 1.x version line.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342312#comment-16342312
 ] 

Jungtaek Lim commented on STORM-2914:
-

So I don't think we even need to have NONE processing guarantee: AT-MOST-ONCE 
should be able to cover that. Topology processing can get delayed more and more 
based on throughput vs incoming, but should never stuck, even we may encounter 
OOME instead if we fail to adjust the parameters.

I'm not sure we may want to support this as official feature, but it may make 
sense that spout discards current offset and pull from latest, if gap between 
current offset and biggest offset exceeds specific threshold (or some other 
conditions are met). That should work similar as [~avermeerbergen]'s self 
healing cron.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342305#comment-16342305
 ] 

Jungtaek Lim commented on STORM-2914:
-

I feel [~avermeerbergen]'s issue is unlikely to having auto-commit or not, 
since there's only a slight difference between AT-MOST-ONCE and auto-commit as 
[~Srdo] explained. There might be little difference for the performance 
perspective since auto-commit is handled by Kafka hence they can make it as 
non-round-trip, but I guess it would be not a big deal.

I think investigating [~avermeerbergen]'s issue would be more helpful rather 
than supporting kind of workaround. The thing is why it was stuck, and why it 
couldn't be restored afterwards, and how does automatic backpressure play and 
how it affects the spout. What does "topology is stuck" mean? Even in back 
pressure state, non-spout should process tuples to reduce tuples in waiting 
queue so that back pressure flag can be turned off eventually. Topology 
shouldn't be stuck (except idle) at any chance.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342254#comment-16342254
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

[~hmclouro] Thanks. I have most of the code changes ready for this and 2913, 
they're fairly small. I'll put up a pair of PRs soon. Happy to change them if 
they turn out to not be how we want to solve these issues.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342243#comment-16342243
 ] 

Hugo Louro commented on STORM-2914:
---

[~Srdo] I am currently with limited Internet access. I will get my input on 
this by tomorrow. Thanks.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342242#comment-16342242
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

[~avermeerbergen]

Regarding your use case, you might try at-least-once mode plus a filter to 
discard old metrics, plus the topology.max.spout.pending parameter to limit how 
many messages go into your topology at a time. That said, I'll continue on with 
the assumption that your current configuration is what you want to keep.

I think it's likely that use cases like yours where you don't need 
at-least-once would be equally well supported by using the AT_MOST_ONCE 
processing guarantee 
https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L165.
 

The main difference between enable.auto.commit and AT_MOST_ONCE is that 
AT_MOST_ONCE synchronously commits offsets to Kafka once it has polled, but 
before the tuples are emitted, while enable.auto.commit commits every once in a 
while before the consumer polls. What this means is that enable.auto.commit may 
emit a tuple 0, 1, 2, etc. times, while AT_MOST_ONCE will emit a tuple 0 or 1 
times. Since users of enable.offset.commit don't care how many times a message 
is processed, they should be fine with the AT_MOST_ONCE guarantee.

My only concern with dropping support for enable.auto.commit and asking people 
to use AT_MOST_ONCE instead, would be the switch to synchronous committing, 
which may have some impact on performance. I would be surprised if the impact 
were meaningful though. 

For now I think we should try to make the spout emulate enable.auto.commit when 
the processing guarantee is NONE, without actually setting enable.auto.commit 
on the consumer. That way we can specify metadata when committing, which allows 
us to fix STORM-2913 properly. If we decide later that enable.auto.commit is 
unnecessary, we can remove NONE as an option.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread Alexandre Vermeerbergen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342221#comment-16342221
 ] 

Alexandre Vermeerbergen commented on STORM-2914:


Hello [~Srdo],

We use auto-commit when consuming Kafka messages in our "real-time alerting" 
topology.

This topology doesn't uses acking not anchoring

Indeed, the purpose of our "real-time alerting" topology is to evaluate metrics 
read from Kafka through expressions (called triggers, like in Zabbix) into 
"alerting severities", which we write to both a Redis database (for rendering 
in an alerting web app) and into other Kafka topics (for notifications / 
persistency purposes handled by other topologies).

We observed that sometimes our alerting topology is flooded by an excessive 
rate of metrics = Kafka messages read by kafka spout, because sometimes a 
remote Kafka cluster which was temporarily unable to replicate data to our 
central Kafka cluster (the one plugged to our topologies) comes "back to life" 
and sends all metrics at once, including metrics which are too old to be 
relevant for real-time alerting.

And because if Storm 1.x back-pressure is kind of ... well, say "limited", the 
result of such flooding is that our topology is just stuck and no longer 
consumes any metric.

So we have a simple self-healer "cron" which periodically checks whether or not 
our topology consumes metrics over a sliding windows of 10 minutes, and when 
consumption is stopped for 10 minutes, this cron will restart our topology with 
"LATEST " strategy. 

 

No with this context, after trying many combinations we found out that with 
Storm 1.1.0 and with our own Kafka spout (working in auto-commit mode, but not 
as powerful as storm kafka client), we had to enable auto-commit otherwise our 
topology would get stuck quite too often.

We had been waiting for Storm 1.2.0 to use the official storm-kafka-client  
with auto-commit support, because our own spout was quite limited.

Now, in order to answer your question about storm-kafka-client removal of 
auto-commit support, I am unsure about what I should answer: what would 
guarantee me that my real-time alerting topology's stability won't be worse 
than it was when I used storm-kafka-client before the following commit broke it 
:[https://github.com/apache/storm/commit/a3899b75a79781602fa58b90de6c8aa784af5332#diff-7d7cbc8f5444fa7ada7962033fc31c5e
 
|https://github.com/apache/storm/commit/a3899b75a79781602fa58b90de6c8aa784af5332#diff-7d7cbc8f5444fa7ada7962033fc31c5e]?

Best regards,

Alexandre Vermeerbergen

 

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

2018-01-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342145#comment-16342145
 ] 

Stig Rohde Døssing commented on STORM-2914:
---

Ping [~hmclouro] and [~avermeerbergen], I'd appreciate your opinions on this.

> Remove enable.auto.commit support from storm-kafka-client
> -
>
> Key: STORM-2914
> URL: https://issues.apache.org/jira/browse/STORM-2914
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> before poll, reassignment and close, and never use the enable.auto.commit 
> option. This allows us to include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)