[
https://issues.apache.org/jira/browse/APEXMALHAR-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325540#comment-15325540
]
ASF GitHub Bot commented on APEXMALHAR-2086:
--------------------------------------------
Github user devtagare commented on a diff in the pull request:
https://github.com/apache/apex-malhar/pull/298#discussion_r66696282
--- Diff:
kafka/src/main/java/org/apache/apex/malhar/kafka/AbstractExactlyOnceKafkaOutputOperator.java
---
@@ -0,0 +1,366 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.apex.malhar.kafka;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.ExecutionException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.apex.malhar.lib.wal.FSWindowDataManager;
+import org.apache.apex.malhar.lib.wal.WindowDataManager;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.clients.producer.ProducerRecord;
+import org.apache.kafka.common.PartitionInfo;
+import org.apache.kafka.common.TopicPartition;
+
+import com.datatorrent.api.Context;
+import com.datatorrent.api.Operator;
+
+import static
org.apache.kafka.clients.consumer.ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG;
+import static
org.apache.kafka.clients.consumer.ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG;
+import static
org.apache.kafka.clients.consumer.ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG;
+import static org.apache.kafka.clients.producer.ProducerConfig.ACKS_CONFIG;
+
+/**
+ * Kafka output operator with exactly once processing semantics under
certain conditions.,
+ *
+ * Requirement for Exactly Once:
+ * Every message within the Window is unique
+ *
+ * This operator uses *Key* to distinguish the messages written by
particular instance of the Output operator.
+ * Operator users can only use *value* for storing the data.
+ *
+ * @displayName Abstract Exactly Once Kafka Output(0.9.0)
+ * @category Messaging
+ * @tags output operator
+ *
+ * @since 3.5
+ */
[email protected]
+public abstract class AbstractExactlyOnceKafkaOutputOperator<T> extends
AbstractKafkaOutputOperator<String, T>
+ implements Operator.CheckpointNotificationListener
+{
+ private transient String key;
+ private transient String appId;
+ private transient Integer operatorId;
+ private transient Long windowId;
+ private transient Map<T, Integer> partialWindowTuples = new HashMap<>();
+ private transient KafkaConsumer consumer;
+
+ private WindowDataManager windowDataManager = new FSWindowDataManager();
+ private final int KAFKA_CONNECT_ATTEMPT = 10;
+ private final String KEY_SEPARATOR = "#";
+ private final String KEY_SERIALIZER =
"org.apache.kafka.common.serialization.StringDeserializer";
+ private final String VALUE_SERIALIZER =
"org.apache.kafka.common.serialization.StringDeserializer";
+
+ @Override
+ public void setup(Context.OperatorContext context)
+ {
+ super.setup(context);
+
+ this.operatorId = context.getId();
+ this.windowDataManager.setup(context);
+ this.appId = context.getValue(Context.DAGContext.APPLICATION_ID);
+ this.key = appId + KEY_SEPARATOR + (new Integer(operatorId));
+ this.consumer = KafkaConsumerInit();
--- End diff --
this.key = (new Integer(operatorId)) + partitionId (kafkaPartition) can be
a more deterministic key.Since you want to peek into only the partitions to
which this operator had written.Also APP_ID is a constant and does not add to
arriving at uniqueness
> Kafka Output Operator with Kafka 0.9 API
> ----------------------------------------
>
> Key: APEXMALHAR-2086
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2086
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Sandesh
> Assignee: Sandesh
>
> Goal : 2 Operartors for Kafka Output
> 1. Simple Kafka Output Operator
> - Supports Atleast Once
> - Expose most used producer properties as class properties
> 2. Exactly Once Kafka Output ( Not possible in all the cases, will be
> documented later )
>
> Design for Exactly Once
> Window Data Manager - Stores the Kafka partitions offsets.
> Kafka Key - Used by the operator = AppID#OperatorId
> During recovery. Partially written window is re-created using the following
> approach:
> Tuples between the largest recovery offsets and the current offset are
> checked. Based on the key, tuples written by the other entities are
> discarded.
> Only tuples which are not in the recovered set are emitted.
> Tuples needs to be unique within the window.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)