[ https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896948#comment-15896948 ]
Raghu Angadi commented on BEAM-1573: ------------------------------------ @peay, There are two levels of solutions to deserializer (and serializer): # Reasonable ways to use of custom Kafka deserializers & serializers * This is very feasible now, including the case when you are reading from multiple topics. # Update to KafkaIO API to pass Kafka serializers directly to the Kafka consumer. * We might end up doing this, not exactly how you proposed, but rather replacing coders with Kafka (de)serializers. There is no need to include both I think. * There is a discussion on Beam mailing lists about removing use of coders directly in sources and other places and that might be right time to add this support. (cc [~jkff]) Are you more interested 1 or 2? One way to use any Kafka serializer (for (1)): {code} PCollection<KafkaRecord<byte[], byte[]> kafkaRecords = // Note that KafkaRecord include topic name, partition etc. pipeline .apply(KafkaIO.<byte[] >read() .withBootstrapServers("broker_1:9092,broker_2:9092") .withTopics(ImmutableList.of("topic_a"))); kafkaRecords.apply( ParDo.of(new DoFn<KafkaRecord<byte[], byte[]>, MyAvroRecord) { private final Map<String, Object> config = // config private transient Deserializer kafkaDeserializer; @Setup public void setup() { kafkaDeserializer = new MyDeserializer(); kafkaDeserializer.configure(config) // kafka config (serializable map) } @ProcessElement public void procesElement(Context context) { MyAvroRecord record = kafkaDeserializer.deserialize(context.element().getTopic(), context.element().getValue()) context.output(record); } @TearDown public void tearDown() { kafkaDeserializer.close(); } })) {code} > KafkaIO does not allow using Kafka serializers and deserializers > ---------------------------------------------------------------- > > Key: BEAM-1573 > URL: https://issues.apache.org/jira/browse/BEAM-1573 > Project: Beam > Issue Type: Bug > Components: sdk-java-extensions > Affects Versions: 0.4.0, 0.5.0 > Reporter: peay > Assignee: Raghu Angadi > Priority: Minor > > KafkaIO does not allow to override the serializer and deserializer settings > of the Kafka consumer and producers it uses internally. Instead, it allows to > set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class > that calls the coder. > I appreciate that allowing to use Beam coders is good and consistent with the > rest of the system. However, is there a reason to completely disallow to use > custom Kafka serializers instead? > This is a limitation when working with an Avro schema registry for instance, > which requires custom serializers. One can write a `Coder` that wraps a > custom Kafka serializer, but that means two levels of un-necessary wrapping. > In addition, the `Coder` abstraction is not equivalent to Kafka's > `Serializer` which gets the topic name as input. Using a `Coder` wrapper > would require duplicating the output topic setting in the argument to > `KafkaIO` and when building the wrapper, which is not elegant and error prone. -- This message was sent by Atlassian JIRA (v6.3.15#6346)