[jira] [Commented] (DRILL-5977) predicate pushdown support kafkaMsgOffset

ASF GitHub Bot (JIRA) Wed, 23 May 2018 07:41:31 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487375#comment-16487375
 ]


ASF GitHub Bot commented on DRILL-5977:
---------------------------------------

akumarb2010 commented on issue #1272: DRILL-5977: Filter Pushdown in 
Drill-Kafka plugin
URL: https://github.com/apache/drill/pull/1272#issuecomment-391372423
 
 
   @aravi5  Sorry for the delay in review and thanks for implementing this nice 
feature.
   
   Before starting the code review, I have few comments on push down design.
   
   1. *KafkaMsgOffset* predicates has to be partition specific right? We should 
not be applying these predicates globally across the partitions. For example 
p1[startOffset=1000, endOffset=2000], p2[1500,5000], So in this case, always 
better to consider the offsets per partition. But in test cases, I am not 
partition specific predicates.
   
   2. Good to see that you have considered multiple scenarios for 
*kafkaMsgTimestamp* predicates and the above point will also applicable for 
*kafkaMsgTimestamp* as well. We might need to consider per partition specific 
*kafkaMsgTimestamp* predicates.  But this might cause the issues, as * 
offsetsForTimes* method is a blocking and it can block indefinitely if user 
provides wrong partition.
   
   Can you please clarify above two comments? 
      
       

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
>                 Key: DRILL-5977
>                 URL: https://issues.apache.org/jira/browse/DRILL-5977
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: B Anil Kumar
>            Assignee: Abhishek Ravi
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting 
> point or a count? Perhaps I want to run my query every five minutes, scanning 
> only those messages since the previous scan. Or, I want to limit my take to, 
> say, the next 1000 messages. Could we use a pseudo-column such as 
> "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5977) predicate pushdown support kafkaMsgOffset

Reply via email to