[ https://issues.apache.org/jira/browse/SPARK-26841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843509#comment-16843509 ]
Richard Yu edited comment on SPARK-26841 at 5/19/19 5:53 PM: ------------------------------------------------------------- [~Bartalos] Could you expain what use cases there are for supporting these queries? It would be helpful if you could provide an example of how a "historic snapshot" of the Kafka table might be useful to the client. was (Author: yohan123): [~Bartalos] Could you expain what use cases there are for supporting these queries? It would be helpful if you could provide an example where a "historic snapshot" might be necessary. > Timestamp pushdown on Kafka table > --------------------------------- > > Key: SPARK-26841 > URL: https://issues.apache.org/jira/browse/SPARK-26841 > Project: Spark > Issue Type: Improvement > Components: Input/Output > Affects Versions: 2.4.0 > Reporter: Tomas Bartalos > Priority: Major > Labels: Kafka, pushdown, timestamp > > As a Spark user I'd like to have fast queries on Kafka table restricted by > timestamp. > I'd like to have quick answers on questions like: > * What was inserted to Kafka in past x minutes > * What was inserted to Kafka in specified time range > Example: > {quote}select * from kafka_table where timestamp > > from_unixtime(unix_timestamp() - 5 * 60, "YYYY-MM-dd HH:mm:ss") > select * from kafka_table where timestamp > $from_time and timestamp < > $end_time > {quote} > Currently timestamp restrictions are not pushdown to KafkaRelation and > querying by timestamp on a large Kafka topic takes forever to complete. > *Technical solution* > Technically its possible to retrieve Kafka's offsets by provided timestamp > with org.apache.kafka.clients.consumer.Consumer#offsetsForTimes(..) method. > Afterwards we can query Kafka topic by retrieved timestamp ranges. > Querying by timestamp range is already implemented so this change should have > minor impact. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org