Github user victor-wong commented on the pull request:
https://github.com/apache/storm/pull/1443#issuecomment-221528200
>Can you instead log an Error, ignore the message and proceed further?
This means data loss to users. I am not sure data loss is acceptable or
not. As I mentioned above,
[ConsumerIterator](https://github.com/apache/kafka/blob/a81ad2582ee0e533d335fe0dc5c5cc885dbf645d/core/src/main/scala/kafka/consumer/ConsumerIterator.scala)
choose to throw an exception (MessageTooLargeException, which will cause Kafka
Consumer to stop working), so I think maybe it is a good way.
> what do you mean by "topology will fetch no data but still be running"?
Will it stop fetching data at all?
The spout will keep trying to fetch data, but the response from Kafka
contains no valid bytes because of size limit. The side effect of this is that
the data in Kafka topic will pile up while user doesn't know why there storm
topology stop processing messages (there is no data to process).
I think you are right that many users don't want to stall the worker
because of one large message, but this a result of incorrect config
(KafkaConfig.fetchSizeBytes) and if they want to avoid this situation, they
need to set a really large size limit at the first time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---