I was going through new kafka spout code and had a couple of questions.
1. https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L98 The instance variable at that line and following 3 lines. Why do we need them? Because of that we have Builder constructors with different parameters for key and value deserializers. We even have https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/SerializableDeserializer.java Not sure if we really need it. https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L555 already has a constructor that takes in Properties or a Map and if key.deserializer and value.deserialzer keys are set to fqcns then it will instantiate them and take care of them at https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L642 . And we already have setProp method on builder to set different kafka configs that will be passed to KafkaConsumer constructor. We can get rid of the SerializableDeserializer interface and a bunch of constructors and instance variables related to Key and Value Deserializable. 2. We have a RecordTranslator interface that is used to declareOutputFields at https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L486 and then we have this special instanceof check here https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L333 Why is return type of apply a List<Object> ? Should we change it to List<KafkaTuple>? That way we can get rid of instanceof check and support multiple tuples emitted for one kafka message. Fixes for above two might not be backward compatible but if everyone is okay with above changes then I can create a patch.
