[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-18 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r526550007



##
File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
##
@@ -2445,7 +2446,9 @@ public void testCheckpointForInactiveTaskGroup()
 final TaskLocation location2 = new TaskLocation("testHost2", 145, -1);
 Collection workItems = new ArrayList<>();
 workItems.add(new TestTaskRunnerWorkItem(id1, null, location1));
+

Review comment:
   please revert this unnecessary change

##
File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
##
@@ -2445,7 +2446,9 @@ public void testCheckpointForInactiveTaskGroup()
 final TaskLocation location2 = new TaskLocation("testHost2", 145, -1);
 Collection workItems = new ArrayList<>();
 workItems.add(new TestTaskRunnerWorkItem(id1, null, location1));
+
 workItems.add(new TestTaskRunnerWorkItem(id2, null, location2));
+

Review comment:
   please revert this unnecessary change





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-18 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r526549768



##
File path: 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/KafkaRecordSupplier.java
##
@@ -37,6 +37,7 @@
 import org.apache.kafka.common.serialization.Deserializer;
 
 import javax.annotation.Nonnull;
+

Review comment:
   please revert this unnecessary change





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-17 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r524960762



##
File path: docs/development/extensions-core/kafka-ingestion.md
##
@@ -144,6 +144,7 @@ A sample supervisor spec is shown below:
 |`lateMessageRejectionStartDateTime`|ISO8601 DateTime|Configure tasks to 
reject messages with timestamps earlier than this date time; for example if 
this is set to `2016-01-01T11:00Z` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime and a nightly batch ingestion pipeline).|no 
(default == none)|
 |`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps earlier than this period before the task was created; 
for example if this is set to `PT1H` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime and a nightly batch ingestion pipeline). Please 
note that only one of `lateMessageRejectionPeriod` or 
`lateMessageRejectionStartDateTime` can be specified.|no (default == none)|
 |`earlyMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps later than this period after the task reached its 
taskDuration; for example if this is set to `PT1H`, the taskDuration is set to 
`PT1H` and the supervisor creates a task at *2016-01-01T12:00Z*, messages with 
timestamps later than *2016-01-01T14:00Z* will be dropped. **Note:** Tasks 
sometimes run past their task duration, for example, in cases of supervisor 
failover. Setting earlyMessageRejectionPeriod too low may cause messages to be 
dropped unexpectedly whenever a task runs past its originally configured task 
duration.|no (default == none)|
+|`consumeTransactionally`|Boolean|Set `consumeTransactionally` false here can 
disable druid to consume Kafka in a transactional way. And druid could consume 
lower version of Kafka now, such as 0.10.2.1 |no (default == true)|
 

Review comment:
   At the beginning of this doc, there's paragraph that describes the 
version of kafka that are supported. Since this PR
provides a way to support those kafka clusters lower than 0.11, we should 
also make some changes to the doc
   
   > The Kafka indexing service supports transactional topics which were 
introduced in Kafka 0.11.x. These changes make the
   > Kafka consumer that Druid uses incompatible with older brokers. Ensure 
that your Kafka brokers are version 0.11.x or   > Kafka consumer that Druid 
uses incompatible with older brokers. Ensure that your Kafka brokers are 
version 0.11.x or
   > better before using this functionality. Refer [Kafka upgrade 
guide](https://kafka.apache.org/documentation/#upgrade)   > better before 
using this functionality. Refer [Kafka upgrade 
guide](https://kafka.apache.org/documentation/#upgrade)
   > if you are using older version of Kafka brokers.   > if you are using 
older version of Kafka brokers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-11 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r521258336



##
File path: 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/KafkaConsumerConfigs.java
##
@@ -30,15 +30,23 @@
  */
 public class KafkaConsumerConfigs
 {
-
-  public static Map getConsumerProperties()
+  public static Map getConsumerProperties(Map 
customerConsumerProperties)
   {
+final Object value = 
customerConsumerProperties.remove("consumeTransactionally");

Review comment:
   Putting this property in `KafkaIndexTaskIOConfig.consumerProperties` 
might cause some confusion. `consumerProperties` is designed to store kafka 
official properties. It's better to put the new property in 
`KafkaIndexTaskIOConfig` directly and put the code related to this property in 
`KafkaRecordSupplier.addConsumerPropertiesFromConfig`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-08 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r519519629



##
File path: 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/KafkaConsumerConfigs.java
##
@@ -38,7 +38,6 @@
 props.put("group.id", StringUtils.format("kafka-supervisor-%s", 
IdUtils.getRandomId()));
 props.put("auto.offset.reset", "none");
 props.put("enable.auto.commit", "false");
-props.put("isolation.level", "read_committed");

Review comment:
   We don't know whether or not the users have enabled transactional 
message in their clusters, so changing the current behavior may cause backward 
compatibility problem. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-06 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r518654741



##
File path: 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/KafkaConsumerConfigs.java
##
@@ -38,7 +38,6 @@
 props.put("group.id", StringUtils.format("kafka-supervisor-%s", 
IdUtils.getRandomId()));
 props.put("auto.offset.reset", "none");
 props.put("enable.auto.commit", "false");
-props.put("isolation.level", "read_committed");

Review comment:
   I think to remove `isolation.level` here and leaves the configuration to 
users changes the current behavior. 
   
   Currently users care nothing about this kafka configuration when using a 
higher version of kafka such as 0.11. By removing it, they need to set this 
property in supervisor spec,  or the default value, which is 
`read_uncommitted`, will be applied, which may be not what they expect.
   
   If this is the root cause that limits the Druid to use Kafka lower than 
0.11, I think maybe we can introduce another property, as the same way as 
`pollTimeout` property does, then we can unset `isolation.level` property 
according to the value of this new property. Only those who want to use Druid 
with older kafka  need to set this new property.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on a change in pull request #10551: Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better

2020-11-06 Thread GitBox


FrankChen021 commented on a change in pull request #10551:
URL: https://github.com/apache/druid/pull/10551#discussion_r518654741



##
File path: 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/KafkaConsumerConfigs.java
##
@@ -38,7 +38,6 @@
 props.put("group.id", StringUtils.format("kafka-supervisor-%s", 
IdUtils.getRandomId()));
 props.put("auto.offset.reset", "none");
 props.put("enable.auto.commit", "false");
-props.put("isolation.level", "read_committed");

Review comment:
   I think to remove `isolation.level` here and leaves the configuration to 
users changes the current behavior. 
   
   Currently no user cares nothing about this configuration when using a higher 
version of kafka such as 0.11. By removing it, they need to set this property 
in supervisor spec,  or the default value, which is `read_uncommitted`, will be 
applied, which may be not what they expect.
   
   If this is the root cause that limits the Druid to use Kafka lower than 
0.11, I think maybe we can introduce another property, as the same way as 
`pollTimeout` property does, then we can unset `isolation.level` property 
according to the value of this new property. Only those who want to use Druid 
with older kafka  need to set this new property.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org