[jira] [Commented] (STORM-2359) Revising Message Timeouts

2017-06-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043409#comment-16043409
 ] 

Stig Rohde Døssing commented on STORM-2359:
---

{quote}
The resets are managed internally by the ACKer bolt. The spout only gets 
notified if the timeout expires or if tuple-tree is fully processed. 
{quote}
How will this work? The current implementation has a pending map in both the 
acker and the spout, which rotate every topology.message.timeout.secs. If the 
acker doesn't forward reset requests to the spout, the spout will just expire 
the tuple tree on its own when the message timeout has passed.

{quote}
That case would be more accurately classified as "progress is being made" ... 
but slower than expected.
The case of 'progress is not being made' is when a worker that is processing 
part of the tuple tree dies.
{quote}
Yes, you are right. But it is currently possible that the topology may degrade 
to no progress being made even if each individual tuple could be processed 
under the message timeout, because tuples can expire while queued and get 
reemitted, where they can then be delayed by their own duplicates which are 
ahead in the queue. For IRichBolts, this can be mitigated by the bolt being 
written to accept and queue tuples internally, where the bolt can then reset 
their timeouts manually if necessary, but for IBasicBolt this is not possible.

Just to give a concrete example, we had an IBasicBolt enrich tuples with some 
database data. Most tuples were processed very quickly, but a few were slow. 
Even the slow tuples never took longer than our message timeout individually. 
We then had an instance where a bunch of slow tuples happened to come in on the 
stream close to each other. The first few were processed before they expired, 
but the rest expired while queued. The spout then reemitted the expired tuples, 
and they got into the queue behind their own expired instances. Since the bolt 
won't skip expired tuples, the freshly emitted tuples also expired, which 
caused another reemit. This repeated until the topology was restarted so the 
queues could be cleared.

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2547) Apache Metron stream: default not found

2017-06-08 Thread Monika Dziuk (JIRA)
Monika Dziuk created STORM-2547:
---

 Summary: Apache Metron stream: default not found
 Key: STORM-2547
 URL: https://issues.apache.org/jira/browse/STORM-2547
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core, storm-elasticsearch
 Environment: Ubuntu
Reporter: Monika Dziuk


Hi together,
we tried to install Metron 0.4.0 accordingly to this tutorial: 
https://community.hortonworks.com/articles/88843/manually-installing-apache-metron-on-ubuntu-1404.html
The installation itself worked, but during step “Smoke Test Metron” two main 
problems aroused:
In the storm topology profiler the splitterBolt doesn’t work. The kafkaSpout 
receives data and forwards it to the splitterBolt which works but splitterBolt 
can’t handle the data.
We got the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: stream: default 
not found
at org.apache.storm.utils.Monitor.metrics(Monitor.java:223)
at org.apache.storm.utils.Monitor.metrics(Monitor.java:159)
at org.apache.storm.command.monitor$_main.doInvoke(monitor.clj:36)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at org.apache.storm.command.monitor.main(Unknown Source)

The same exception occurs in the topology indexing at the indexingBolt.
All of the other topologies do work. Also, we receive data in HDFS but not in 
Elasticsearch. 
We’d appreciate any tips!
Thanks in advance! :) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (STORM-2343) New Kafka spout can stop emitting tuples if more than maxUncommittedOffsets tuples fail at once

2017-06-08 Thread Prasanna Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042382#comment-16042382
 ] 

Prasanna Ranganathan edited comment on STORM-2343 at 6/8/17 7:56 AM:
-

The illustrative example is very useful in thinking through the issue. Thanks 
for the same. Generally speaking, my preference is for a simple solution with 
minimal overhead in the Spout. 

There is another critical issue that we need to consider though. Created the 
JIRA just now: STORM-2546

I am trying to come up with a solution that would address both these issues 
comprehensively. We are essentially dealing with the challenge of ensuring 
failing tuples are properly accounted for and the at least once processing 
guarantee is enforced properly.


was (Author: ranganp):
The illustrative example is very useful in thinking through the issue. Thanks 
for the same. Generally speaking, my preference is for a simple solution with 
minimal overhead in the Spout. 

There is another critical issue that we need to consider though. Created the 
JIRA just now: STORM-2546

I am trying to think come up with a solution that would address both these 
issues comprehensively. We are essentially dealing with the challenge of 
ensuring failing tuples are properly accounted for and the at least once 
processing guarantee is enforced properly.

> New Kafka spout can stop emitting tuples if more than maxUncommittedOffsets 
> tuples fail at once
> ---
>
> Key: STORM-2343
> URL: https://issues.apache.org/jira/browse/STORM-2343
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Critical
> Fix For: 2.0.0, 1.1.1
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> It doesn't look like the spout is respecting maxUncommittedOffsets in all 
> cases. If the underlying consumer returns more records in a call to poll() 
> than maxUncommittedOffsets, they will all be added to waitingToEmit. Since 
> poll may return up to 500 records by default (Kafka 0.10.1.1), this is pretty 
> likely to happen with low maxUncommittedOffsets.
> The spout only checks for tuples to retry if it decides to poll, and it only 
> decides to poll if numUncommittedOffsets < maxUncommittedOffsets. Since 
> maxUncommittedOffsets isn't being respected when retrieving or emitting 
> records, numUncommittedOffsets can be much larger than maxUncommittedOffsets. 
> If more than maxUncommittedOffsets messages fail, this can cause the spout to 
> stop polling entirely.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2546) Kafka spout can stall / get stuck due to edge case with failing tuples

2017-06-08 Thread Prasanna Ranganathan (JIRA)
Prasanna Ranganathan created STORM-2546:
---

 Summary: Kafka spout can stall / get stuck due to edge case with 
failing tuples
 Key: STORM-2546
 URL: https://issues.apache.org/jira/browse/STORM-2546
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka-client
Affects Versions: 2.0.0, 1.x
Reporter: Prasanna Ranganathan


The mechanism for replaying a failed tuple involves seeking the kafka consumer 
to the failing offset and then re-emitting it into the topology. A tuple, when 
emitted the first time, will have an entry created in OffsetManager. This entry 
will be removed only after the tuple is successfully acknowledged and its 
offset successfully committed. Till then, commits for offsets beyond the 
failing offset for that TopicPartition will be blocked.

It is possible that when the spout seeks the consumer to the failing offset, 
the corresponding kafka message is not returned in the poll response. This can 
happen due to that offset being deleted or compacted away. In this scenario 
that partition will be blocked from committing and progressing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)