[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896948#comment-15896948
 ] 

Raghu Angadi edited comment on BEAM-1573 at 3/6/17 6:40 PM:
------------------------------------------------------------

@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
     ** This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
      ** We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
      ** There is a discussion on Beam mailing lists about removing use of 
coders directly in sources and other places and that might be right time to add 
this support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection<KafkaRecord<byte[], byte[]> kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
    .apply(KafkaIO.<byte[] >read()
    .withBootstrapServers("broker_1:9092,broker_2:9092")
    .withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn<KafkaRecord<byte[], byte[]>, 
MyAvroRecord>) {
 
   private final Map<String, Object> config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
      kafkaDeserializer = new MyDeserializer();
     kafkaDeserializer.configure(config) // kafka config (serializable map)
    }

   @ProcessElement
    public void procesElement(Context context) {
       MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
       context.output(record);
   }
 
   @TearDown
   public void tearDown() {
     kafkaDeserializer.close();
   }
}))

{code}
    
   


was (Author: rangadi):
@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
    * This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
     * We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
     * There is a discussion on Beam mailing lists about removing use of coders 
directly in sources and other places and that might be right time to add this 
support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection<KafkaRecord<byte[], byte[]> kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
    .apply(KafkaIO.<byte[] >read()
    .withBootstrapServers("broker_1:9092,broker_2:9092")
    .withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn<KafkaRecord<byte[], byte[]>, 
MyAvroRecord) {
 
   private final Map<String, Object> config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
      kafkaDeserializer = new MyDeserializer();
     kafkaDeserializer.configure(config) // kafka config (serializable map)
    }

   @ProcessElement
    public void procesElement(Context context) {
       MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
       context.output(record);
   }
 
   @TearDown
   public void tearDown() {
     kafkaDeserializer.close();
   }
}))

{code}
    
   

> KafkaIO does not allow using Kafka serializers and deserializers
> ----------------------------------------------------------------
>
>                 Key: BEAM-1573
>                 URL: https://issues.apache.org/jira/browse/BEAM-1573
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: peay
>            Assignee: Raghu Angadi
>            Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to