Github user srdo commented on a diff in the pull request:
https://github.com/apache/storm/pull/2465#discussion_r157543286
--- Diff:
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
---
@@ -225,6 +243,25 @@ private long doSeek(TopicPartition tp,
OffsetAndMetadata committedOffset) {
}
}
+ /**
+ * Checks If {@link OffsetAndMetadata} was committed by an instance of
{@link KafkaSpout} in this topology.
+ * This info is used to decide if {@link FirstPollOffsetStrategy}
should be applied
+ *
+ * @param committedOffset {@link OffsetAndMetadata} info committed to
Kafka
+ * @return true if this topology committed this {@link
OffsetAndMetadata}, false otherwise
+ */
+ private boolean isOffsetCommittedByThisTopology(OffsetAndMetadata
committedOffset) {
+ try {
+ final KafkaSpout.Info info =
JSON_MAPPER.readValue(committedOffset.metadata(), KafkaSpout.Info.class);
+ return info.getTopologyId().equals(context.getStormId());
+ } catch (IOException e) {
+ LOG.trace("Failed to deserialize {}. Error likely occurred
because the last commit " +
--- End diff --
Sure, but we're still deserializing the metadata on every emit. I think we
can very likely get away with deserializing once per commit, or even once per
spout activation, since we don't need to serialize + deserialize if we only
update the metadata when committing.
---