[
https://issues.apache.org/jira/browse/APEXMALHAR-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325498#comment-15325498
]
ASF GitHub Bot commented on APEXMALHAR-2086:
--------------------------------------------
Github user chinmaykolhatkar commented on a diff in the pull request:
https://github.com/apache/apex-malhar/pull/298#discussion_r66694101
--- Diff:
kafka/src/main/java/org/apache/apex/malhar/kafka/AbstractExactlyOnceKafkaOutputOperator.java
---
@@ -0,0 +1,366 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.apex.malhar.kafka;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.ExecutionException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.apex.malhar.lib.wal.FSWindowDataManager;
+import org.apache.apex.malhar.lib.wal.WindowDataManager;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.clients.producer.ProducerRecord;
+import org.apache.kafka.common.PartitionInfo;
+import org.apache.kafka.common.TopicPartition;
+
+import com.datatorrent.api.Context;
+import com.datatorrent.api.Operator;
+
+import static
org.apache.kafka.clients.consumer.ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG;
+import static
org.apache.kafka.clients.consumer.ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG;
+import static
org.apache.kafka.clients.consumer.ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG;
+import static org.apache.kafka.clients.producer.ProducerConfig.ACKS_CONFIG;
+
+/**
+ * Kafka output operator with exactly once processing semantics under
certain conditions.,
+ *
+ * Requirement for Exactly Once:
+ * Every message within the Window is unique
+ *
+ * This operator uses *Key* to distinguish the messages written by
particular instance of the Output operator.
+ * Operator users can only use *value* for storing the data.
+ *
+ * @displayName Abstract Exactly Once Kafka Output(0.9.0)
+ * @category Messaging
+ * @tags output operator
+ *
+ * @since 3.5
+ */
[email protected]
+public abstract class AbstractExactlyOnceKafkaOutputOperator<T> extends
AbstractKafkaOutputOperator<String, T>
+ implements Operator.CheckpointNotificationListener
--- End diff --
I see this class is extended by KafkaSinglePortExactlyOnceOutputOperator..
The subclass does following 2 things:
1. Add a port
2. call sendTuple method of parent class.
I see there is no need to have this one as abstract operator. This can
become concreate and add the port here itself
> Kafka Output Operator with Kafka 0.9 API
> ----------------------------------------
>
> Key: APEXMALHAR-2086
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2086
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Sandesh
> Assignee: Sandesh
>
> Goal : 2 Operartors for Kafka Output
> 1. Simple Kafka Output Operator
> - Supports Atleast Once
> - Expose most used producer properties as class properties
> 2. Exactly Once Kafka Output ( Not possible in all the cases, will be
> documented later )
>
> Design for Exactly Once
> Window Data Manager - Stores the Kafka partitions offsets.
> Kafka Key - Used by the operator = AppID#OperatorId
> During recovery. Partially written window is re-created using the following
> approach:
> Tuples between the largest recovery offsets and the current offset are
> checked. Based on the key, tuples written by the other entities are
> discarded.
> Only tuples which are not in the recovered set are emitted.
> Tuples needs to be unique within the window.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)