[GitHub] jihoonson opened a new issue #6001: Segment publishing order should be preserved in kafka indexing service

2018-07-12 Thread GitBox
jihoonson opened a new issue #6001: Segment publishing order should be 
preserved in kafka indexing service
URL: https://github.com/apache/incubator-druid/issues/6001
 
 
   In Kafka indexing service, the overlord does a sanity check that the start 
offsets of partitions of current publishing segments are same with the ones 
stored in metastore, so that it guarantees that all segments are published in 
order. Because of this check, some tasks might fail in this scenario.
   
   1. The supervisor created a task (`T1`) with a start offset `O1`.
   2. Somehow, the supervisor couldn't send an endOffset `O2` to `T1` in 
`taskDuration`. Instead, it sent an endOffset `O3` to `T1` after `taskDuration 
* 10`. (In our case, supervisor couldn't send because of too frequent HTTP 
connection refused errors.)
   3. `T1` started to merge, push, and publish segments.
   4. The supervisor created a new task, `T2`, with a start offset `O3`.
   5. After `taskDuration`, it sent an endOffset `O4` to `T2`. 
   6. `T2` started to merge, push, and publish segments.
   7. Since `T1` had run for a much longer time, it had much more segments to 
publish than `T2`. As a result, `T2` tried to publish before `T1` complete 
publishing.
   8. `T2` failed to publish because of the sanity check when updating 
metastore.
   
   So, I think the supervisor should be able to guarantee segment publishing 
order across all running tasks like below.
   
   ```
   T1: indexing ===> pushing ===> publishing ===> handoff
 T2: indexing ===> pushing ===> publishing ===> handoff
   T3: indexing ===> pushing ===> publishing ===> 
handoff
 ...
   ```
   
   To do so, I suppose the supervisor should be able to send pushing signals to 
kafka tasks as well as publishing signals.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] nishantmonu51 commented on issue #5859: [DISCUSS] Dropping the task audit log table from metastore

2018-07-12 Thread GitBox
nishantmonu51 commented on issue #5859: [DISCUSS] Dropping the task audit log 
table from metastore
URL: 
https://github.com/apache/incubator-druid/issues/5859#issuecomment-404721450
 
 
   +1 on dropping task action audit table. Never seen anyone using that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] himanshug commented on issue #5859: [DISCUSS] Dropping the task audit log table from metastore

2018-07-12 Thread GitBox
himanshug commented on issue #5859: [DISCUSS] Dropping the task audit log table 
from metastore
URL: 
https://github.com/apache/incubator-druid/issues/5859#issuecomment-404691078
 
 
   I don't remember needing to use that table ever since I have used Druid. So, 
from my side I think it is ok to remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] himanshug commented on issue #6000: Fix versioned interval timeline to add same version data

2018-07-12 Thread GitBox
himanshug commented on issue #6000: Fix versioned interval timeline to add same 
version data
URL: https://github.com/apache/incubator-druid/pull/6000#issuecomment-404689894
 
 
   > The expectation is that new data for the same version should overwrite 
previous data of the same version.
   
   Not really, that would break immutability contract of a segment. two 
segments with same name and version MUST have same content.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] niketh opened a new pull request #6000: Fix versioned interval timeline to add same version data

2018-07-12 Thread GitBox
niketh opened a new pull request #6000: Fix versioned interval timeline to add 
same version data
URL: https://github.com/apache/incubator-druid/pull/6000
 
 
   @gianm VersionedIntervalTimeline doesn't add data for the same version 
again. The expectation is that new data for the same version should overwrite 
previous data of the same version.
   
   Also, I think there were a few unit tests that weren't correct, I fixed 
them. Can you take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] niketh opened a new pull request #5999: Fix versionedinterval timeline to handle new data for the same version

2018-07-12 Thread GitBox
niketh opened a new pull request #5999: Fix versionedinterval timeline to 
handle new data for the same version
URL: https://github.com/apache/incubator-druid/pull/5999
 
 
   @gianm  VersionedIntervalTimeline doesn't add data for the same version 
again. The expectation is that new data for the same version should overwrite 
previous data of the same version.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] jon-wei commented on a change in pull request #5996: Fix NPE while handling CheckpointNotice in KafkaSupervisor

2018-07-12 Thread GitBox
jon-wei commented on a change in pull request #5996: Fix NPE while handling 
CheckpointNotice in KafkaSupervisor
URL: https://github.com/apache/incubator-druid/pull/5996#discussion_r202168514
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/main/java/io/druid/indexing/kafka/supervisor/KafkaSupervisor.java
 ##
 @@ -1087,23 +1088,18 @@ public Boolean apply(KafkaIndexTask.Status status)
 }
 return false;
   } else {
-final TaskGroup taskGroup = new TaskGroup(
-ImmutableMap.copyOf(
-kafkaTask.getIOConfig()
- .getStartPartitions()
- .getPartitionOffsetMap()
-), 
kafkaTask.getIOConfig().getMinimumMessageTime(),
-kafkaTask.getIOConfig().getMaximumMessageTime()
-);
-if (taskGroups.putIfAbsent(
+final TaskGroup taskGroup = 
taskGroups.computeIfAbsent(
 taskGroupId,
-taskGroup
-) == null) {
-  
sequenceTaskGroup.put(generateSequenceName(taskGroup), 
taskGroups.get(taskGroupId));
-  log.info("Created new task group [%d]", 
taskGroupId);
 
 Review comment:
   Looks like this removes the logging event for new task groups, can we 
preserve that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



[GitHub] surekhasaharan opened a new pull request #5998: Add support to filter on datasource for active tasks

2018-07-12 Thread GitBox
surekhasaharan opened a new pull request #5998: Add support to filter on 
datasource for active tasks
URL: https://github.com/apache/incubator-druid/pull/5998
 
 
   * Added datasource filter to sql query for active tasks
   * Fixed unit tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Regarding becoming a contributor

2018-07-12 Thread Gian Merlino
Hi Himanshu,

Awesome that you are interested in helping out! We have a community page
here that describes how you can get started: http://druid.io/community/

The basics are:

1) Subscribe to the dev list.
2) If you are using Druid today and have an itch, scratching that itch is
the best way to get started -- ask around to see if there's a way to
implement a feature you want or fix a bug that's bugging you.
3) If you want some ideas, check out open issues, especially the ones
labeled "easy" (they tend to make good starter issues), like
https://github.com/apache/incubator-druid/issues/5869 or
https://github.com/apache/incubator-druid/issues/5644.

Happy Druiding!

On Thu, Jul 12, 2018 at 9:56 AM Himanshu Pandey 
wrote:

> Hey,
>
> I recently came across Druid and would like to contribute to it.
>
> Where/how I can start. I has checking list of open issue but looks most of
> them are already worked /fixed.
>
> Thanks!
>
> *Thanks & Regards,*
> *Himanshu Pandey*
> *Cell: +1 (408) 644 - 8765*
>


Regarding becoming a contributor

2018-07-12 Thread Himanshu Pandey
Hey,

I recently came across Druid and would like to contribute to it.

Where/how I can start. I has checking list of open issue but looks most of
them are already worked /fixed.

Thanks!

*Thanks & Regards,*
*Himanshu Pandey*
*Cell: +1 (408) 644 - 8765*


[GitHub] chirpy2291 commented on issue #3770: global cached lookups / java.sql.SQLException: No suitable driver found for jdbc

2018-07-12 Thread GitBox
chirpy2291 commented on issue #3770: global cached lookups /  
java.sql.SQLException: No suitable driver found for jdbc
URL: 
https://github.com/apache/incubator-druid/issues/3770#issuecomment-404579606
 
 
   Please update the documentation for lookups:regarding jdbc lookups!It would 
help others :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org