Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

Stig Døssing Wed, 14 Jun 2017 08:38:28 -0700

It looks public to me?
https://github.com/apache/storm/blob/38e997ed96ce6627cabb4054224d7044fd2b40f9/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L461


I think its good to be able to avoid telescoping constructors, while at the
same time not having a bunch of setters on the KafkaSpoutConfig. That's the
main purpose I think the builder has, allowing KafkaSpoutConfig to be
immutable.

I'd be happy to fiddle with it if you have an example to work from?

2017-06-14 1:11 GMT+02:00 Priyank Shah <ps...@hortonworks.com>:

> Hi Stig,
>
> I think KafkaSpoutConfig constructor is private and it's throwing errors
> while using the approach that you mentioned. Making it public defeats be
> purpose of the builder. Can you give it a shot and confirm at your end if
> it's possible?
>
> Thanks
> Priyank
>
> Sent from my iPhone
>
> > On Jun 13, 2017, at 9:36 AM, Stig Døssing <generalbas....@gmail.com>
> wrote:
> >
> > Hi Priyank,
> >
> > I changed my mind since those mails were sent. I don't think setKey/Value
> > are very useful. They couldn't be used with the default Kafka
> > deserializers, only for deserializers implemented by the user, and then
> > only if they were declared to implement SerializableDeserializer. I agree
> > that we should remove them, and I'm not going to undo anything currently
> in
> > the PR (unless there are objections on the PR of course)
> >
> > With regard to getting rid of the builder pattern, I think it is a pretty
> > nice pattern for Java. It looks to me like it should be possible to
> declare
> > and configure the builder with "component:", and then pass it to the
> > KafkaSpoutConfig constructor with "ref:" after (which lets you avoid
> > calling build()). Doesn't this work?
> >
> > 2017-06-12 23:32 GMT+02:00 Priyank Shah <ps...@hortonworks.com>:
> >
> >> Hi Stig,
> >>
> >> I think PR https://github.com/apache/storm/pull/2155/files you created
> >> gets rid of setKey and setValue. I am fine with it and in fact that’s
> what
> >> I was suggesting in first place. However, your last two email replies
> >> suggest something else. Just making sure you are not going to undo
> anything
> >> in the PR and that we are same page about this change. i.e. no setKey or
> >> setValue. Either for SerializableDeserializer implementations or Kafka
> >> Deserializer interface. Only string in fqcn as a property.
> >>
> >> The other thing I propose is to get rid of the builder class. Reason is
> >> constructing an object with builder pattern requires builder.build and
> that
> >> does work well with flux yaml. I think we should be careful about
> >> implementing new connectors and make sure they work with yaml as well. I
> >> have commented on the PR as well. Unless, someone else has a different
> >> opinion can you address that as well?
> >>
> >> On 6/10/17, 2:05 AM, "Stig Døssing" <generalbas....@gmail.com> wrote:
> >>
> >>    Priyank, I was a bit too hasty in the last response. The setKey/Value
> >>    functions are necessary when users want to set only the key or the
> >> value
> >>    deserializer. I think we should keep those. It may be possible to
> >>    deduplicate the functionality on the API by removing the Builder
> >>    constructors that takes deserializers, and by getting rid of the
> >>    setKey/Value functions that take Class instances, since those seem
> >> like a
> >>    duplication of the consumer config functionality. This should get rid
> >> of a
> >>    lot of the overloads.
> >>
> >>    2017-06-10 10:20 GMT+02:00 Stig Døssing <generalbas....@gmail.com>:
> >>
> >>> Harsha,
> >>>
> >>> +1 for simplifying away those methods that are just setting consumer
> >>> config. The properties I think we should keep are all the spout
> >>> configuration (timeouts, retry handling, tuple construction). Maybe
> >> we
> >>> deprecate the consumer config functions on 1.x and remove them on
> >> master?
> >>>
> >>> Priyank,
> >>>
> >>> When the spout is declared, it takes type parameters to define the
> >> key and
> >>> value types of the consumer. We are able to check at compile time
> >> that the
> >>> deserializers match those expected types.
> >>> e.g.
> >>> SerializableDeserializer<Integer> des = ...
> >>>
> >>> KafkaSpoutConfig<Integer, String> config = KafkaSpoutConfig.builder("
> >> dummy",
> >>> "dummy")
> >>>            .setKey(des)
> >>>            .build();
> >>>
> >>> KafkaSpout<String, String> wrongTypeSpout = new KafkaSpout<>(config);
> >>>
> >>> will not compile, while
> >>>
> >>> SerializableDeserializer<String> des = ...
> >>>
> >>> KafkaSpoutConfig<String, String> config = KafkaSpoutConfig.builder("
> >> dummy",
> >>> "dummy")
> >>>            .setKey(des)
> >>>            .build();
> >>>
> >>> KafkaSpout<String, String> spout = new KafkaSpout<>(config);
> >>>
> >>> will. If we want to simplify the API, maybe we should just mirror the
> >>> KafkaConsumer API more closely and remove the Builder setKey/Value
> >> methods.
> >>> I can't think of a reason why a user should need to create a Builder
> >> of one
> >>> type, and then change the type later with setKey/Value. The
> >> deserializers
> >>> can just go in through the Builder constructor.
> >>>
> >>> About KafkaTuple, I think it was done this way originally since
> >> requiring
> >>> users to subclass KafkaTuple would be a breaking change. If we want
> >> to do
> >>> it it should go in 2.x only. I'm not necessarily opposed to doing
> >> it, but I
> >>> don't really have a strong opinion on it.
> >>>
> >>> Hugo,
> >>>
> >>> I appreciate that the subscribe API is a major new feature of the 0.9
> >>> consumer, but I can't come up with any reason to use it in Storm. I
> >> don't
> >>> think we should support it just because it is there. As mentioned
> >> upthread,
> >>> the features offered by that API are already covered by Storm, so
> >> I'm not
> >>> seeing the value to having it. If we can't come up with a use case
> >> for it I
> >>> don't see a reason to allow users to configure it, especially given
> >> the
> >>> non-obvious problems users who choose it are likely to run into.
> >>>
> >>>
> >>> 2017-06-10 <20%2017%2006%2010> 6:03 GMT+02:00 Harsha <
> >> st...@harsha.io>:
> >>>
> >>>> Dynamic assignment is what causing all the issues that we see now.
> >>>> 1. Duplicates at the start of the KafkaSpout and upon any rebalance
> >>>> 2. Trident Kafka Spout not holding the transactional batches.
> >>>> Many corner cases can easily produce duplicates.
> >>>>
> >>>> There is no point in keeping dynamic assignment given all the issues
> >>>> that are showing up.
> >>>> Here is the excerpt from Kafka consumer docs
> >>>> https://www-us.apache.org/dist/kafka/0.10.0.1/javadoc/org/
> >>>> apache/kafka/clients/consumer/KafkaConsumer.html
> >>>> "If the process itself is highly available and will be restarted if
> >> it
> >>>> fails (perhaps using a cluster management framework like YARN,
> >> Mesos, or
> >>>> AWS facilities, or as part of a stream processing framework). In
> >> this
> >>>> case there is no need for Kafka to detect the failure and reassign
> >> the
> >>>> partition since the consuming process will be restarted on another
> >>>> machine."
> >>>>
> >>>> Manual assignment is the right way to go.
> >>>>
> >>>> -Harsha
> >>>>
> >>>> On Jun 9, 2017, 4:09 PM -0700, Hugo Da Cruz Louro
> >>>> <hlo...@hortonworks.com>, wrote:
> >>>> +1 for simplifying KafkaSpoutConfig. Too many constructors and too
> >> many
> >>>> methods.. I am not sure it’s justifiable to have any methods that
> >> simply
> >>>> set KafkaConsumer properties. All of these properties should just
> >> go in
> >>>> a Map<String, Object>, which is what KafkaConsumer receives, and
> >> what
> >>>> was supported in the initial implementation. The names of the
> >> properties
> >>>> can be retrieved from org.apache.kafka.clients.
> >> consumer.ConsumerConfig.
> >>>> At this point we may have to keep in mind backwards compatibility.
> >>>>
> >>>> Not sure we should completely discontinue dynamic partition
> >> assignment,
> >>>> as it is one of primary features of the new Storm Kafka Client API.
> >> With
> >>>> this said, manual partition assignment should be supported and would
> >>>> solve a lot of potential problems arising from dynamic partition
> >>>> assignment.
> >>>>
> >>>> Hugo
> >>>>
> >>>> On Jun 9, 2017, at 3:33 PM, Harsha <st...@harsha.io> wrote:
> >>>>
> >>>> I think question why we need all those settings when a user can
> >> pass it
> >>>> via Properties with consumer properties defined or via Map conf
> >> object.
> >>>> Having the methods on top of consumer config means every time Kafka
> >>>> consumer property added or changed one needs add a builder method.
> >> We
> >>>> need to get out of the way and let the user configure it like they
> >> do it
> >>>> for typical Kafka Consumer instead we've 10s of methods that sets
> >>>> properties for ConsumerConfig.
> >>>>
> >>>> Examples:
> >>>> https://github.com/apache/storm/blob/master/external/storm-
> >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/
> >>>> KafkaSpoutConfig.java#L317
> >>>>
> >>>> https://github.com/apache/storm/blob/master/external/storm-
> >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/
> >>>> KafkaSpoutConfig.java#L309
> >>>> etc.. all of these are specific to KafkaConsumer config, users
> >> should
> >>>> be able to pass it via Properties all of these.
> >>>>
> >>>> https://github.com/apache/storm/blob/master/external/storm-
> >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/
> >>>> KafkaSpoutConfig.java#L327
> >>>>
> >>>> whats the benefit of adding that method? and we are forcing that to
> >> set
> >>>> the protocol to "SSL" in this method
> >>>> https://github.com/apache/storm/blob/master/external/storm-
> >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/
> >>>> KafkaSpoutConfig.java#L318
> >>>>
> >>>> Users can set the ssl properties and then can select the
> >>>> securityProtocol "SASL_SSL" which requires both kerberos and ssl
> >> configs
> >>>> to be set. In above case making a call setSSLTruststore changes the
> >>>> security.protocol to "SSL". This could easily run into issues if the
> >>>> users sets securityProtocol first with "SASL_SSL" then later calls
> >>>> setSSLTruststore which changes it to "SSL".
> >>>>
> >>>> We are over-taking these settings instead of letting user to figure
> >> out
> >>>> from Kafka consumer config page.
> >>>>
> >>>> In contrast we've KafkaProducer which does this
> >>>> https://github.com/apache/storm/blob/master/external/storm-
> >>>> kafka-client/src/main/java/org/apache/storm/kafka/bolt/
> >>>> KafkaBolt.java#L121
> >>>> . I would add Properties object instead of deriving it from
> >> topologyConf
> >>>> but this is much more easier to understand for the users. The
> >> contract
> >>>> here is put whatever the producer configs that users wants in the
> >> conf
> >>>> object and we create producer out of that config.
> >>>>
> >>>> Honestly these interfaces needs to be simple and let the user have
> >>>> control instead of adding our interpretation.
> >>>>
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Harsha
> >>>> On Jun 9, 2017, 2:08 PM -0700, Stig Døssing <
> >> generalbas....@gmail.com>,
> >>>> wrote:
> >>>> I'd be happy with a simpler KafkaSpoutConfig, but I think most of
> >> the
> >>>> configuration parameters have good reasons for being there. Any
> >> examples
> >>>> of
> >>>> parameters you think we should remove?
> >>>>
> >>>> 2017-06-09 21:34 GMT+02:00 Harsha <st...@harsha.io>:
> >>>>
> >>>> +1 on using the manual assignment for the reasons specified below.
> >> We
> >>>> will see duplicates even in stable conditions which
> >>>> is not good. I don’t see any reason not to switch to manual
> >> assignment.
> >>>> While we are at it we should refactor the KafkaConfig part.
> >>>> It should be as simple as accepting the kafka consumer config or
> >>>> properties file and forwarding it to KafkaConsumer. We made
> >>>> it overly complex and unnecessary.
> >>>>
> >>>> Thanks,
> >>>> Harsha
> >>>>
> >>>
> >>>
> >>
> >>
> >>
>
>

Re: Use of the KafkaConsumer.subscribe API in the storm-kafka-client spout

Reply via email to