[GitHub] jihoonson opened a new issue #6001: Segment publishing order should be preserved in kafka indexing service
jihoonson opened a new issue #6001: Segment publishing order should be preserved in kafka indexing service URL: https://github.com/apache/incubator-druid/issues/6001 In Kafka indexing service, the overlord does a sanity check that the start offsets of partitions of current publishing segments are same with the ones stored in metastore, so that it guarantees that all segments are published in order. Because of this check, some tasks might fail in this scenario. 1. The supervisor created a task (`T1`) with a start offset `O1`. 2. Somehow, the supervisor couldn't send an endOffset `O2` to `T1` in `taskDuration`. Instead, it sent an endOffset `O3` to `T1` after `taskDuration * 10`. (In our case, supervisor couldn't send because of too frequent HTTP connection refused errors.) 3. `T1` started to merge, push, and publish segments. 4. The supervisor created a new task, `T2`, with a start offset `O3`. 5. After `taskDuration`, it sent an endOffset `O4` to `T2`. 6. `T2` started to merge, push, and publish segments. 7. Since `T1` had run for a much longer time, it had much more segments to publish than `T2`. As a result, `T2` tried to publish before `T1` complete publishing. 8. `T2` failed to publish because of the sanity check when updating metastore. So, I think the supervisor should be able to guarantee segment publishing order across all running tasks like below. ``` T1: indexing ===> pushing ===> publishing ===> handoff T2: indexing ===> pushing ===> publishing ===> handoff T3: indexing ===> pushing ===> publishing ===> handoff ... ``` To do so, I suppose the supervisor should be able to send pushing signals to kafka tasks as well as publishing signals. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] nishantmonu51 commented on issue #5859: [DISCUSS] Dropping the task audit log table from metastore
nishantmonu51 commented on issue #5859: [DISCUSS] Dropping the task audit log table from metastore URL: https://github.com/apache/incubator-druid/issues/5859#issuecomment-404721450 +1 on dropping task action audit table. Never seen anyone using that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] himanshug commented on issue #5859: [DISCUSS] Dropping the task audit log table from metastore
himanshug commented on issue #5859: [DISCUSS] Dropping the task audit log table from metastore URL: https://github.com/apache/incubator-druid/issues/5859#issuecomment-404691078 I don't remember needing to use that table ever since I have used Druid. So, from my side I think it is ok to remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] himanshug commented on issue #6000: Fix versioned interval timeline to add same version data
himanshug commented on issue #6000: Fix versioned interval timeline to add same version data URL: https://github.com/apache/incubator-druid/pull/6000#issuecomment-404689894 > The expectation is that new data for the same version should overwrite previous data of the same version. Not really, that would break immutability contract of a segment. two segments with same name and version MUST have same content. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] niketh opened a new pull request #6000: Fix versioned interval timeline to add same version data
niketh opened a new pull request #6000: Fix versioned interval timeline to add same version data URL: https://github.com/apache/incubator-druid/pull/6000 @gianm VersionedIntervalTimeline doesn't add data for the same version again. The expectation is that new data for the same version should overwrite previous data of the same version. Also, I think there were a few unit tests that weren't correct, I fixed them. Can you take a look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] niketh opened a new pull request #5999: Fix versionedinterval timeline to handle new data for the same version
niketh opened a new pull request #5999: Fix versionedinterval timeline to handle new data for the same version URL: https://github.com/apache/incubator-druid/pull/5999 @gianm VersionedIntervalTimeline doesn't add data for the same version again. The expectation is that new data for the same version should overwrite previous data of the same version. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] jon-wei commented on a change in pull request #5996: Fix NPE while handling CheckpointNotice in KafkaSupervisor
jon-wei commented on a change in pull request #5996: Fix NPE while handling CheckpointNotice in KafkaSupervisor URL: https://github.com/apache/incubator-druid/pull/5996#discussion_r202168514 ## File path: extensions-core/kafka-indexing-service/src/main/java/io/druid/indexing/kafka/supervisor/KafkaSupervisor.java ## @@ -1087,23 +1088,18 @@ public Boolean apply(KafkaIndexTask.Status status) } return false; } else { -final TaskGroup taskGroup = new TaskGroup( -ImmutableMap.copyOf( -kafkaTask.getIOConfig() - .getStartPartitions() - .getPartitionOffsetMap() -), kafkaTask.getIOConfig().getMinimumMessageTime(), -kafkaTask.getIOConfig().getMaximumMessageTime() -); -if (taskGroups.putIfAbsent( +final TaskGroup taskGroup = taskGroups.computeIfAbsent( taskGroupId, -taskGroup -) == null) { - sequenceTaskGroup.put(generateSequenceName(taskGroup), taskGroups.get(taskGroupId)); - log.info("Created new task group [%d]", taskGroupId); Review comment: Looks like this removes the logging event for new task groups, can we preserve that? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
[GitHub] surekhasaharan opened a new pull request #5998: Add support to filter on datasource for active tasks
surekhasaharan opened a new pull request #5998: Add support to filter on datasource for active tasks URL: https://github.com/apache/incubator-druid/pull/5998 * Added datasource filter to sql query for active tasks * Fixed unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org
Re: Regarding becoming a contributor
Hi Himanshu, Awesome that you are interested in helping out! We have a community page here that describes how you can get started: http://druid.io/community/ The basics are: 1) Subscribe to the dev list. 2) If you are using Druid today and have an itch, scratching that itch is the best way to get started -- ask around to see if there's a way to implement a feature you want or fix a bug that's bugging you. 3) If you want some ideas, check out open issues, especially the ones labeled "easy" (they tend to make good starter issues), like https://github.com/apache/incubator-druid/issues/5869 or https://github.com/apache/incubator-druid/issues/5644. Happy Druiding! On Thu, Jul 12, 2018 at 9:56 AM Himanshu Pandey wrote: > Hey, > > I recently came across Druid and would like to contribute to it. > > Where/how I can start. I has checking list of open issue but looks most of > them are already worked /fixed. > > Thanks! > > *Thanks & Regards,* > *Himanshu Pandey* > *Cell: +1 (408) 644 - 8765* >
Regarding becoming a contributor
Hey, I recently came across Druid and would like to contribute to it. Where/how I can start. I has checking list of open issue but looks most of them are already worked /fixed. Thanks! *Thanks & Regards,* *Himanshu Pandey* *Cell: +1 (408) 644 - 8765*
[GitHub] chirpy2291 commented on issue #3770: global cached lookups / java.sql.SQLException: No suitable driver found for jdbc
chirpy2291 commented on issue #3770: global cached lookups / java.sql.SQLException: No suitable driver found for jdbc URL: https://github.com/apache/incubator-druid/issues/3770#issuecomment-404579606 Please update the documentation for lookups:regarding jdbc lookups!It would help others :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org