Re: Anyone else mis-interpret the "KafkaConsumer" and "KafkaProducer" all the time?

Dale LaBossiere Thu, 22 Mar 2018 09:45:06 -0700

A bit of background…

The Kafka connector is two classes instead of a single KafkaStreams connector 
(with publish(),subscribe()) because at least a while ago, don’t know if this 
is still the case, Kafka had two completely separate classes for a “consumer” 
and a “producer" each with very different config setup params. By comparison 
MQTT has a single MqttClient class (with publish()/subscribe()).

At the time, the decision was to name the Edgent Kafka classes similar to the 
underlying Kafka API classes.  Hence KafkaConsumer (~wrapping Kafka’s 
ConsumerConnector) and KafkaProducer (~wrapping Kafka’s KafkaProducer).  While 
not exposed today, it’s conceivable that some day one could create an Edgent 
Kafka connector instance by providing a Kafka API class directly instead of 
just a config map - e.g., supplying a Kafka KafkaProducer as an arg to the 
Edgent KafkaProducer connector's constructor.  So having the names align seems 
like goodness.

I don’t think the Edgent connectors should be trying to make it unnecessary for 
a user to understand or to mask the underlying system’s API… just make it 
usable, easily usable for a simple/common cases, in an Edgent topology context 
(worrying about when to make an actually external connection, recovering from 
broken connections / reconnecting, handling common tuple types).

As for the specific suggestions, I think simply switching the names of Edgent’s 
KafkaConsumer and KafkaProducer is a bad idea :-)

Offering KafkaSource and KafkaSink is OK I guess (though probably retaining the 
current names for a release or three).  Though I’ll note the Edgent API uses 
“source” and “sink” as verbs, which take a Supplier and a Consumer fn as args 
respectively.  Note Consumer used in the context with sink.

Alternatively there’s KafkaSubscriber and KafkaPublisher.  While clearer than 
Consumer/Producer, I don’t know if they’re any better than Source/Sink.

In the end I guess I don’t feel strongly about it all… though wonder if it’s 
really worth the effort in changing.  At least the Edgent connector’s javadoc 
is pretty good / clear for the classes and their use... I think :-)

— Dale

> On Mar 20, 2018, at 9:59 PM, vino yang <[email protected]> wrote:
> 
> Hi Chris,
> 
> All data processing framework could think it as a *pipeline . *The Edgent's
> point of view, there could be two endpoints :
> 
> 
>   - source : means data injection;
>   - sink : means data export;
> 
> There are many frameworks use this conventional naming rule, such as Apache
> Flume, Apache Flink, Apache Spark(structured streaming) .
> 
> I think "KafkaConsumer" could be replaced with "KafkaSource" and
> "KafkaProducer" could be named "KafkaSink".
> 
> And middle of the pipeline is the transformation of the data, there are
> many operators to transform data ,such as map, flatmap, filter, reduce...
> and so on.
> 
> Vino yang.
> Thanks.
> 
> 2018-03-20 20:51 GMT+08:00 Christofer Dutz <[email protected]>:
> 
>> Hi,
>> 
>> have been using the Kafka integration quite often in the past and one
>> thing I always have to explain when demonstrating code and which seems to
>> confuse everyone seeing the code:
>> 
>> I would expect a KafkaConsumer to consume Edgent messages and publish them
>> to Kafka and would expect a KafkaProducer to produce Edgent events.
>> 
>> Unfortunately it seems to be the other way around. This seems a little
>> unintuitive. Judging from the continued confusion when demonstrating code
>> eventually it’s worth considering to rename these (swap their names).
>> Eventually even rename them to “KafkaSource” (Edgent Source that consumes
>> Kafka messages and produces Edgent events) and “KafkaConsumer” (Consumes
>> Edgent Events and produces Kafka messages). After all the Classes are in
>> the Edgent namespace and come from the Edgent libs, so the fixed point when
>> inspecting these should be clear. Also I bet no one would be confused if we
>> called something that produces Kafka messages a consumer as there should
>> never be code that handles this from a Kafka point of view AND uses Edgent
>> at the same time.
>> 
>> Chris
>> 
>> 
>>

Re: Anyone else mis-interpret the "KafkaConsumer" and "KafkaProducer" all the time?

Reply via email to