It looks public to me? https://github.com/apache/storm/blob/38e997ed96ce6627cabb4054224d7044fd2b40f9/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L461
I think its good to be able to avoid telescoping constructors, while at the same time not having a bunch of setters on the KafkaSpoutConfig. That's the main purpose I think the builder has, allowing KafkaSpoutConfig to be immutable. I'd be happy to fiddle with it if you have an example to work from? 2017-06-14 1:11 GMT+02:00 Priyank Shah <ps...@hortonworks.com>: > Hi Stig, > > I think KafkaSpoutConfig constructor is private and it's throwing errors > while using the approach that you mentioned. Making it public defeats be > purpose of the builder. Can you give it a shot and confirm at your end if > it's possible? > > Thanks > Priyank > > Sent from my iPhone > > > On Jun 13, 2017, at 9:36 AM, Stig Døssing <generalbas....@gmail.com> > wrote: > > > > Hi Priyank, > > > > I changed my mind since those mails were sent. I don't think setKey/Value > > are very useful. They couldn't be used with the default Kafka > > deserializers, only for deserializers implemented by the user, and then > > only if they were declared to implement SerializableDeserializer. I agree > > that we should remove them, and I'm not going to undo anything currently > in > > the PR (unless there are objections on the PR of course) > > > > With regard to getting rid of the builder pattern, I think it is a pretty > > nice pattern for Java. It looks to me like it should be possible to > declare > > and configure the builder with "component:", and then pass it to the > > KafkaSpoutConfig constructor with "ref:" after (which lets you avoid > > calling build()). Doesn't this work? > > > > 2017-06-12 23:32 GMT+02:00 Priyank Shah <ps...@hortonworks.com>: > > > >> Hi Stig, > >> > >> I think PR https://github.com/apache/storm/pull/2155/files you created > >> gets rid of setKey and setValue. I am fine with it and in fact that’s > what > >> I was suggesting in first place. However, your last two email replies > >> suggest something else. Just making sure you are not going to undo > anything > >> in the PR and that we are same page about this change. i.e. no setKey or > >> setValue. Either for SerializableDeserializer implementations or Kafka > >> Deserializer interface. Only string in fqcn as a property. > >> > >> The other thing I propose is to get rid of the builder class. Reason is > >> constructing an object with builder pattern requires builder.build and > that > >> does work well with flux yaml. I think we should be careful about > >> implementing new connectors and make sure they work with yaml as well. I > >> have commented on the PR as well. Unless, someone else has a different > >> opinion can you address that as well? > >> > >> On 6/10/17, 2:05 AM, "Stig Døssing" <generalbas....@gmail.com> wrote: > >> > >> Priyank, I was a bit too hasty in the last response. The setKey/Value > >> functions are necessary when users want to set only the key or the > >> value > >> deserializer. I think we should keep those. It may be possible to > >> deduplicate the functionality on the API by removing the Builder > >> constructors that takes deserializers, and by getting rid of the > >> setKey/Value functions that take Class instances, since those seem > >> like a > >> duplication of the consumer config functionality. This should get rid > >> of a > >> lot of the overloads. > >> > >> 2017-06-10 10:20 GMT+02:00 Stig Døssing <generalbas....@gmail.com>: > >> > >>> Harsha, > >>> > >>> +1 for simplifying away those methods that are just setting consumer > >>> config. The properties I think we should keep are all the spout > >>> configuration (timeouts, retry handling, tuple construction). Maybe > >> we > >>> deprecate the consumer config functions on 1.x and remove them on > >> master? > >>> > >>> Priyank, > >>> > >>> When the spout is declared, it takes type parameters to define the > >> key and > >>> value types of the consumer. We are able to check at compile time > >> that the > >>> deserializers match those expected types. > >>> e.g. > >>> SerializableDeserializer<Integer> des = ... > >>> > >>> KafkaSpoutConfig<Integer, String> config = KafkaSpoutConfig.builder(" > >> dummy", > >>> "dummy") > >>> .setKey(des) > >>> .build(); > >>> > >>> KafkaSpout<String, String> wrongTypeSpout = new KafkaSpout<>(config); > >>> > >>> will not compile, while > >>> > >>> SerializableDeserializer<String> des = ... > >>> > >>> KafkaSpoutConfig<String, String> config = KafkaSpoutConfig.builder(" > >> dummy", > >>> "dummy") > >>> .setKey(des) > >>> .build(); > >>> > >>> KafkaSpout<String, String> spout = new KafkaSpout<>(config); > >>> > >>> will. If we want to simplify the API, maybe we should just mirror the > >>> KafkaConsumer API more closely and remove the Builder setKey/Value > >> methods. > >>> I can't think of a reason why a user should need to create a Builder > >> of one > >>> type, and then change the type later with setKey/Value. The > >> deserializers > >>> can just go in through the Builder constructor. > >>> > >>> About KafkaTuple, I think it was done this way originally since > >> requiring > >>> users to subclass KafkaTuple would be a breaking change. If we want > >> to do > >>> it it should go in 2.x only. I'm not necessarily opposed to doing > >> it, but I > >>> don't really have a strong opinion on it. > >>> > >>> Hugo, > >>> > >>> I appreciate that the subscribe API is a major new feature of the 0.9 > >>> consumer, but I can't come up with any reason to use it in Storm. I > >> don't > >>> think we should support it just because it is there. As mentioned > >> upthread, > >>> the features offered by that API are already covered by Storm, so > >> I'm not > >>> seeing the value to having it. If we can't come up with a use case > >> for it I > >>> don't see a reason to allow users to configure it, especially given > >> the > >>> non-obvious problems users who choose it are likely to run into. > >>> > >>> > >>> 2017-06-10 <20%2017%2006%2010> 6:03 GMT+02:00 Harsha < > >> st...@harsha.io>: > >>> > >>>> Dynamic assignment is what causing all the issues that we see now. > >>>> 1. Duplicates at the start of the KafkaSpout and upon any rebalance > >>>> 2. Trident Kafka Spout not holding the transactional batches. > >>>> Many corner cases can easily produce duplicates. > >>>> > >>>> There is no point in keeping dynamic assignment given all the issues > >>>> that are showing up. > >>>> Here is the excerpt from Kafka consumer docs > >>>> https://www-us.apache.org/dist/kafka/0.10.0.1/javadoc/org/ > >>>> apache/kafka/clients/consumer/KafkaConsumer.html > >>>> "If the process itself is highly available and will be restarted if > >> it > >>>> fails (perhaps using a cluster management framework like YARN, > >> Mesos, or > >>>> AWS facilities, or as part of a stream processing framework). In > >> this > >>>> case there is no need for Kafka to detect the failure and reassign > >> the > >>>> partition since the consuming process will be restarted on another > >>>> machine." > >>>> > >>>> Manual assignment is the right way to go. > >>>> > >>>> -Harsha > >>>> > >>>> On Jun 9, 2017, 4:09 PM -0700, Hugo Da Cruz Louro > >>>> <hlo...@hortonworks.com>, wrote: > >>>> +1 for simplifying KafkaSpoutConfig. Too many constructors and too > >> many > >>>> methods.. I am not sure it’s justifiable to have any methods that > >> simply > >>>> set KafkaConsumer properties. All of these properties should just > >> go in > >>>> a Map<String, Object>, which is what KafkaConsumer receives, and > >> what > >>>> was supported in the initial implementation. The names of the > >> properties > >>>> can be retrieved from org.apache.kafka.clients. > >> consumer.ConsumerConfig. > >>>> At this point we may have to keep in mind backwards compatibility. > >>>> > >>>> Not sure we should completely discontinue dynamic partition > >> assignment, > >>>> as it is one of primary features of the new Storm Kafka Client API. > >> With > >>>> this said, manual partition assignment should be supported and would > >>>> solve a lot of potential problems arising from dynamic partition > >>>> assignment. > >>>> > >>>> Hugo > >>>> > >>>> On Jun 9, 2017, at 3:33 PM, Harsha <st...@harsha.io> wrote: > >>>> > >>>> I think question why we need all those settings when a user can > >> pass it > >>>> via Properties with consumer properties defined or via Map conf > >> object. > >>>> Having the methods on top of consumer config means every time Kafka > >>>> consumer property added or changed one needs add a builder method. > >> We > >>>> need to get out of the way and let the user configure it like they > >> do it > >>>> for typical Kafka Consumer instead we've 10s of methods that sets > >>>> properties for ConsumerConfig. > >>>> > >>>> Examples: > >>>> https://github.com/apache/storm/blob/master/external/storm- > >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ > >>>> KafkaSpoutConfig.java#L317 > >>>> > >>>> https://github.com/apache/storm/blob/master/external/storm- > >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ > >>>> KafkaSpoutConfig.java#L309 > >>>> etc.. all of these are specific to KafkaConsumer config, users > >> should > >>>> be able to pass it via Properties all of these. > >>>> > >>>> https://github.com/apache/storm/blob/master/external/storm- > >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ > >>>> KafkaSpoutConfig.java#L327 > >>>> > >>>> whats the benefit of adding that method? and we are forcing that to > >> set > >>>> the protocol to "SSL" in this method > >>>> https://github.com/apache/storm/blob/master/external/storm- > >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ > >>>> KafkaSpoutConfig.java#L318 > >>>> > >>>> Users can set the ssl properties and then can select the > >>>> securityProtocol "SASL_SSL" which requires both kerberos and ssl > >> configs > >>>> to be set. In above case making a call setSSLTruststore changes the > >>>> security.protocol to "SSL". This could easily run into issues if the > >>>> users sets securityProtocol first with "SASL_SSL" then later calls > >>>> setSSLTruststore which changes it to "SSL". > >>>> > >>>> We are over-taking these settings instead of letting user to figure > >> out > >>>> from Kafka consumer config page. > >>>> > >>>> In contrast we've KafkaProducer which does this > >>>> https://github.com/apache/storm/blob/master/external/storm- > >>>> kafka-client/src/main/java/org/apache/storm/kafka/bolt/ > >>>> KafkaBolt.java#L121 > >>>> . I would add Properties object instead of deriving it from > >> topologyConf > >>>> but this is much more easier to understand for the users. The > >> contract > >>>> here is put whatever the producer configs that users wants in the > >> conf > >>>> object and we create producer out of that config. > >>>> > >>>> Honestly these interfaces needs to be simple and let the user have > >>>> control instead of adding our interpretation. > >>>> > >>>> > >>>> > >>>> Thanks, > >>>> Harsha > >>>> On Jun 9, 2017, 2:08 PM -0700, Stig Døssing < > >> generalbas....@gmail.com>, > >>>> wrote: > >>>> I'd be happy with a simpler KafkaSpoutConfig, but I think most of > >> the > >>>> configuration parameters have good reasons for being there. Any > >> examples > >>>> of > >>>> parameters you think we should remove? > >>>> > >>>> 2017-06-09 21:34 GMT+02:00 Harsha <st...@harsha.io>: > >>>> > >>>> +1 on using the manual assignment for the reasons specified below. > >> We > >>>> will see duplicates even in stable conditions which > >>>> is not good. I don’t see any reason not to switch to manual > >> assignment. > >>>> While we are at it we should refactor the KafkaConfig part. > >>>> It should be as simple as accepting the kafka consumer config or > >>>> properties file and forwarding it to KafkaConsumer. We made > >>>> it overly complex and unnecessary. > >>>> > >>>> Thanks, > >>>> Harsha > >>>> > >>> > >>> > >> > >> > >> > >