[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka
[ https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654308#comment-15654308 ] Cody Koeninger commented on SPARK-18386: That should work. There may be dependency conflicts trying to put a 0.10.1 jar in the same job as a 0.10.0, though. > Batch mode SQL source for Kafka > --- > > Key: SPARK-18386 > URL: https://issues.apache.org/jira/browse/SPARK-18386 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Cody Koeninger > > An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for > querying over a defined batch of offsets. > The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X > to timestamp Y) should be taken into account, even if not available in the > initial implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka
[ https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653821#comment-15653821 ] Ofir Manor commented on SPARK-18386: BTW [~c...@koeninger.org] - I think that filtering (by timestamp) can be done today "the hard way" if the Kafka broker is 0.10.1. The user could use the 0.10.1 client to get a list of offsets for his requested timestamp, then submit a job to spark using explicit offsets to be used by Spark's 0.10.0 client (quite ugly but should work). > Batch mode SQL source for Kafka > --- > > Key: SPARK-18386 > URL: https://issues.apache.org/jira/browse/SPARK-18386 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Cody Koeninger > > An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for > querying over a defined batch of offsets. > The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X > to timestamp Y) should be taken into account, even if not available in the > initial implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka
[ https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653388#comment-15653388 ] Ofir Manor commented on SPARK-18386: This would be really useful for me right now! We have many bounded Kafka topics, for example, metrics of the last couple of months, web clicks of the last 7 days etc. It would be great to be able to just query them with Spark (I'd use "earliest" starting offsets in my case). There is also an interesting interaction between structured streaming and regular queries, where each streaming batch recomputes the regular queries it depends on. It works with the file sources, I'd like to use that with Kafka source as well in some cases. If would also be great if the external API will be as close to the current one ({{spark.readStream.format("kafka").option(...)}}) as possible (same options etc), maybe just with {{spark.read.kakfa...}}? > Batch mode SQL source for Kafka > --- > > Key: SPARK-18386 > URL: https://issues.apache.org/jira/browse/SPARK-18386 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Cody Koeninger > > An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for > querying over a defined batch of offsets. > The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X > to timestamp Y) should be taken into account, even if not available in the > initial implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org