Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

Priyank Shah Mon, 12 Jun 2017 14:33:07 -0700

Hi Stig,

I think PR https://github.com/apache/storm/pull/2155/files you created gets rid 
of setKey and setValue. I am fine with it and in fact that’s what I was 
suggesting in first place. However, your last two email replies suggest 
something else. Just making sure you are not going to undo anything in the PR 
and that we are same page about this change. i.e. no setKey or setValue. Either 
for SerializableDeserializer implementations or Kafka Deserializer interface. 
Only string in fqcn as a property.


The other thing I propose is to get rid of the builder class. Reason is 
constructing an object with builder pattern requires builder.build and that 
does work well with flux yaml. I think we should be careful about implementing 
new connectors and make sure they work with yaml as well. I have commented on 
the PR as well. Unless, someone else has a different opinion can you address 
that as well?

On 6/10/17, 2:05 AM, "Stig Døssing" <generalbas....@gmail.com> wrote:

    Priyank, I was a bit too hasty in the last response. The setKey/Value
    functions are necessary when users want to set only the key or the value
    deserializer. I think we should keep those. It may be possible to
    deduplicate the functionality on the API by removing the Builder
    constructors that takes deserializers, and by getting rid of the
    setKey/Value functions that take Class instances, since those seem like a
    duplication of the consumer config functionality. This should get rid of a
    lot of the overloads.
    
    2017-06-10 10:20 GMT+02:00 Stig Døssing <generalbas....@gmail.com>:
    
    > Harsha,
    >
    > +1 for simplifying away those methods that are just setting consumer
    > config. The properties I think we should keep are all the spout
    > configuration (timeouts, retry handling, tuple construction). Maybe we
    > deprecate the consumer config functions on 1.x and remove them on master?
    >
    > Priyank,
    >
    > When the spout is declared, it takes type parameters to define the key and
    > value types of the consumer. We are able to check at compile time that the
    > deserializers match those expected types.
    > e.g.
    > SerializableDeserializer<Integer> des = ...
    >
    > KafkaSpoutConfig<Integer, String> config = 
KafkaSpoutConfig.builder("dummy",
    > "dummy")
    >             .setKey(des)
    >             .build();
    >
    > KafkaSpout<String, String> wrongTypeSpout = new KafkaSpout<>(config);
    >
    > will not compile, while
    >
    > SerializableDeserializer<String> des = ...
    >
    > KafkaSpoutConfig<String, String> config = 
KafkaSpoutConfig.builder("dummy",
    > "dummy")
    >             .setKey(des)
    >             .build();
    >
    > KafkaSpout<String, String> spout = new KafkaSpout<>(config);
    >
    > will. If we want to simplify the API, maybe we should just mirror the
    > KafkaConsumer API more closely and remove the Builder setKey/Value 
methods.
    > I can't think of a reason why a user should need to create a Builder of 
one
    > type, and then change the type later with setKey/Value. The deserializers
    > can just go in through the Builder constructor.
    >
    > About KafkaTuple, I think it was done this way originally since requiring
    > users to subclass KafkaTuple would be a breaking change. If we want to do
    > it it should go in 2.x only. I'm not necessarily opposed to doing it, but 
I
    > don't really have a strong opinion on it.
    >
    > Hugo,
    >
    > I appreciate that the subscribe API is a major new feature of the 0.9
    > consumer, but I can't come up with any reason to use it in Storm. I don't
    > think we should support it just because it is there. As mentioned 
upthread,
    > the features offered by that API are already covered by Storm, so I'm not
    > seeing the value to having it. If we can't come up with a use case for it 
I
    > don't see a reason to allow users to configure it, especially given the
    > non-obvious problems users who choose it are likely to run into.
    >
    >
    > 2017-06-10 <20%2017%2006%2010> 6:03 GMT+02:00 Harsha <st...@harsha.io>:
    >
    >> Dynamic assignment is what causing all the issues that we see now.
    >> 1. Duplicates at the start of the KafkaSpout and upon any rebalance
    >> 2. Trident Kafka Spout not holding the transactional batches.
    >> Many corner cases can easily produce duplicates.
    >>
    >> There is no point in keeping dynamic assignment given all the issues
    >> that are showing up.
    >> Here is the excerpt from Kafka consumer docs
    >> https://www-us.apache.org/dist/kafka/0.10.0.1/javadoc/org/
    >> apache/kafka/clients/consumer/KafkaConsumer.html
    >> "If the process itself is highly available and will be restarted if it
    >> fails (perhaps using a cluster management framework like YARN, Mesos, or
    >> AWS facilities, or as part of a stream processing framework). In this
    >> case there is no need for Kafka to detect the failure and reassign the
    >> partition since the consuming process will be restarted on another
    >> machine."
    >>
    >> Manual assignment is the right way to go.
    >>
    >> -Harsha
    >>
    >> On Jun 9, 2017, 4:09 PM -0700, Hugo Da Cruz Louro
    >> <hlo...@hortonworks.com>, wrote:
    >> +1 for simplifying KafkaSpoutConfig. Too many constructors and too many
    >> methods.. I am not sure it’s justifiable to have any methods that simply
    >> set KafkaConsumer properties. All of these properties should just go in
    >> a Map<String, Object>, which is what KafkaConsumer receives, and what
    >> was supported in the initial implementation. The names of the properties
    >> can be retrieved from org.apache.kafka.clients.consumer.ConsumerConfig.
    >> At this point we may have to keep in mind backwards compatibility.
    >>
    >> Not sure we should completely discontinue dynamic partition assignment,
    >> as it is one of primary features of the new Storm Kafka Client API. With
    >> this said, manual partition assignment should be supported and would
    >> solve a lot of potential problems arising from dynamic partition
    >> assignment.
    >>
    >> Hugo
    >>
    >> On Jun 9, 2017, at 3:33 PM, Harsha <st...@harsha.io> wrote:
    >>
    >> I think question why we need all those settings when a user can pass it
    >> via Properties with consumer properties defined or via Map conf object.
    >> Having the methods on top of consumer config means every time Kafka
    >> consumer property added or changed one needs add a builder method. We
    >> need to get out of the way and let the user configure it like they do it
    >> for typical Kafka Consumer instead we've 10s of methods that sets
    >> properties for ConsumerConfig.
    >>
    >> Examples:
    >> https://github.com/apache/storm/blob/master/external/storm-
    >> kafka-client/src/main/java/org/apache/storm/kafka/spout/
    >> KafkaSpoutConfig.java#L317
    >>
    >> https://github.com/apache/storm/blob/master/external/storm-
    >> kafka-client/src/main/java/org/apache/storm/kafka/spout/
    >> KafkaSpoutConfig.java#L309
    >> etc.. all of these are specific to KafkaConsumer config, users should
    >> be able to pass it via Properties all of these.
    >>
    >> https://github.com/apache/storm/blob/master/external/storm-
    >> kafka-client/src/main/java/org/apache/storm/kafka/spout/
    >> KafkaSpoutConfig.java#L327
    >>
    >> whats the benefit of adding that method? and we are forcing that to set
    >> the protocol to "SSL" in this method
    >> https://github.com/apache/storm/blob/master/external/storm-
    >> kafka-client/src/main/java/org/apache/storm/kafka/spout/
    >> KafkaSpoutConfig.java#L318
    >>
    >> Users can set the ssl properties and then can select the
    >> securityProtocol "SASL_SSL" which requires both kerberos and ssl configs
    >> to be set. In above case making a call setSSLTruststore changes the
    >> security.protocol to "SSL". This could easily run into issues if the
    >> users sets securityProtocol first with "SASL_SSL" then later calls
    >> setSSLTruststore which changes it to "SSL".
    >>
    >> We are over-taking these settings instead of letting user to figure out
    >> from Kafka consumer config page.
    >>
    >> In contrast we've KafkaProducer which does this
    >> https://github.com/apache/storm/blob/master/external/storm-
    >> kafka-client/src/main/java/org/apache/storm/kafka/bolt/
    >> KafkaBolt.java#L121
    >> . I would add Properties object instead of deriving it from topologyConf
    >> but this is much more easier to understand for the users. The contract
    >> here is put whatever the producer configs that users wants in the conf
    >> object and we create producer out of that config.
    >>
    >> Honestly these interfaces needs to be simple and let the user have
    >> control instead of adding our interpretation.
    >>
    >>
    >>
    >> Thanks,
    >> Harsha
    >> On Jun 9, 2017, 2:08 PM -0700, Stig Døssing <generalbas....@gmail.com>,
    >> wrote:
    >> I'd be happy with a simpler KafkaSpoutConfig, but I think most of the
    >> configuration parameters have good reasons for being there. Any examples
    >> of
    >> parameters you think we should remove?
    >>
    >> 2017-06-09 21:34 GMT+02:00 Harsha <st...@harsha.io>:
    >>
    >> +1 on using the manual assignment for the reasons specified below. We
    >> will see duplicates even in stable conditions which
    >> is not good. I don’t see any reason not to switch to manual assignment.
    >> While we are at it we should refactor the KafkaConfig part.
    >> It should be as simple as accepting the kafka consumer config or
    >> properties file and forwarding it to KafkaConsumer. We made
    >> it overly complex and unnecessary.
    >>
    >> Thanks,
    >> Harsha
    >>
    >
    >

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

Reply via email to