[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043409#comment-16043409 ] Stig Rohde Døssing commented on STORM-2359: --- {quote} The resets are managed internally by the ACKer bolt. The spout only gets notified if the timeout expires or if tuple-tree is fully processed. {quote} How will this work? The current implementation has a pending map in both the acker and the spout, which rotate every topology.message.timeout.secs. If the acker doesn't forward reset requests to the spout, the spout will just expire the tuple tree on its own when the message timeout has passed. {quote} That case would be more accurately classified as "progress is being made" ... but slower than expected. The case of 'progress is not being made' is when a worker that is processing part of the tuple tree dies. {quote} Yes, you are right. But it is currently possible that the topology may degrade to no progress being made even if each individual tuple could be processed under the message timeout, because tuples can expire while queued and get reemitted, where they can then be delayed by their own duplicates which are ahead in the queue. For IRichBolts, this can be mitigated by the bolt being written to accept and queue tuples internally, where the bolt can then reset their timeouts manually if necessary, but for IBasicBolt this is not possible. Just to give a concrete example, we had an IBasicBolt enrich tuples with some database data. Most tuples were processed very quickly, but a few were slow. Even the slow tuples never took longer than our message timeout individually. We then had an instance where a bunch of slow tuples happened to come in on the stream close to each other. The first few were processed before they expired, but the rest expired while queued. The spout then reemitted the expired tuples, and they got into the queue behind their own expired instances. Since the bolt won't skip expired tuples, the freshly emitted tuples also expired, which caused another reemit. This repeated until the topology was restarted so the queues could be cleared. > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2547) Apache Metron stream: default not found
Monika Dziuk created STORM-2547: --- Summary: Apache Metron stream: default not found Key: STORM-2547 URL: https://issues.apache.org/jira/browse/STORM-2547 Project: Apache Storm Issue Type: Bug Components: storm-core, storm-elasticsearch Environment: Ubuntu Reporter: Monika Dziuk Hi together, we tried to install Metron 0.4.0 accordingly to this tutorial: https://community.hortonworks.com/articles/88843/manually-installing-apache-metron-on-ubuntu-1404.html The installation itself worked, but during step “Smoke Test Metron” two main problems aroused: In the storm topology profiler the splitterBolt doesn’t work. The kafkaSpout receives data and forwards it to the splitterBolt which works but splitterBolt can’t handle the data. We got the following exception: Exception in thread "main" java.lang.IllegalArgumentException: stream: default not found at org.apache.storm.utils.Monitor.metrics(Monitor.java:223) at org.apache.storm.utils.Monitor.metrics(Monitor.java:159) at org.apache.storm.command.monitor$_main.doInvoke(monitor.clj:36) at clojure.lang.RestFn.applyTo(RestFn.java:137) at org.apache.storm.command.monitor.main(Unknown Source) The same exception occurs in the topology indexing at the indexingBolt. All of the other topologies do work. Also, we receive data in HDFS but not in Elasticsearch. We’d appreciate any tips! Thanks in advance! :) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (STORM-2343) New Kafka spout can stop emitting tuples if more than maxUncommittedOffsets tuples fail at once
[ https://issues.apache.org/jira/browse/STORM-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042382#comment-16042382 ] Prasanna Ranganathan edited comment on STORM-2343 at 6/8/17 7:56 AM: - The illustrative example is very useful in thinking through the issue. Thanks for the same. Generally speaking, my preference is for a simple solution with minimal overhead in the Spout. There is another critical issue that we need to consider though. Created the JIRA just now: STORM-2546 I am trying to come up with a solution that would address both these issues comprehensively. We are essentially dealing with the challenge of ensuring failing tuples are properly accounted for and the at least once processing guarantee is enforced properly. was (Author: ranganp): The illustrative example is very useful in thinking through the issue. Thanks for the same. Generally speaking, my preference is for a simple solution with minimal overhead in the Spout. There is another critical issue that we need to consider though. Created the JIRA just now: STORM-2546 I am trying to think come up with a solution that would address both these issues comprehensively. We are essentially dealing with the challenge of ensuring failing tuples are properly accounted for and the at least once processing guarantee is enforced properly. > New Kafka spout can stop emitting tuples if more than maxUncommittedOffsets > tuples fail at once > --- > > Key: STORM-2343 > URL: https://issues.apache.org/jira/browse/STORM-2343 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Critical > Fix For: 2.0.0, 1.1.1 > > Time Spent: 14h 50m > Remaining Estimate: 0h > > It doesn't look like the spout is respecting maxUncommittedOffsets in all > cases. If the underlying consumer returns more records in a call to poll() > than maxUncommittedOffsets, they will all be added to waitingToEmit. Since > poll may return up to 500 records by default (Kafka 0.10.1.1), this is pretty > likely to happen with low maxUncommittedOffsets. > The spout only checks for tuples to retry if it decides to poll, and it only > decides to poll if numUncommittedOffsets < maxUncommittedOffsets. Since > maxUncommittedOffsets isn't being respected when retrieving or emitting > records, numUncommittedOffsets can be much larger than maxUncommittedOffsets. > If more than maxUncommittedOffsets messages fail, this can cause the spout to > stop polling entirely. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2546) Kafka spout can stall / get stuck due to edge case with failing tuples
Prasanna Ranganathan created STORM-2546: --- Summary: Kafka spout can stall / get stuck due to edge case with failing tuples Key: STORM-2546 URL: https://issues.apache.org/jira/browse/STORM-2546 Project: Apache Storm Issue Type: Bug Components: storm-kafka-client Affects Versions: 2.0.0, 1.x Reporter: Prasanna Ranganathan The mechanism for replaying a failed tuple involves seeking the kafka consumer to the failing offset and then re-emitting it into the topology. A tuple, when emitted the first time, will have an entry created in OffsetManager. This entry will be removed only after the tuple is successfully acknowledged and its offset successfully committed. Till then, commits for offsets beyond the failing offset for that TopicPartition will be blocked. It is possible that when the spout seeks the consumer to the failing offset, the corresponding kafka message is not returned in the poll response. This can happen due to that offset being deleted or compacted away. In this scenario that partition will be blocked from committing and progressing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)