[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka

2016-11-10 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654308#comment-15654308
 ] 

Cody Koeninger commented on SPARK-18386:


That should work.  There may be dependency conflicts trying to put a 0.10.1 jar 
in the same job as a 0.10.0, though.

> Batch mode SQL source for Kafka
> ---
>
> Key: SPARK-18386
> URL: https://issues.apache.org/jira/browse/SPARK-18386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Cody Koeninger
>
> An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for 
> querying over a defined batch of offsets.
> The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X 
> to timestamp Y) should be taken into account, even if not available in the 
> initial implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka

2016-11-10 Thread Ofir Manor (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653821#comment-15653821
 ] 

Ofir Manor commented on SPARK-18386:


BTW [~c...@koeninger.org] - I think that filtering (by timestamp) can be done 
today  "the hard way" if the Kafka broker is 0.10.1.
The user could use the 0.10.1 client to get a list of offsets for his requested 
timestamp, then submit a job to spark using explicit offsets to be used by 
Spark's 0.10.0 client (quite ugly but should  work).

> Batch mode SQL source for Kafka
> ---
>
> Key: SPARK-18386
> URL: https://issues.apache.org/jira/browse/SPARK-18386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Cody Koeninger
>
> An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for 
> querying over a defined batch of offsets.
> The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X 
> to timestamp Y) should be taken into account, even if not available in the 
> initial implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka

2016-11-10 Thread Ofir Manor (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653388#comment-15653388
 ] 

Ofir Manor commented on SPARK-18386:


This would be really useful for me right now!
We have many bounded Kafka topics, for example, metrics of the last couple of 
months, web clicks of the last 7 days etc. It would be great to be able to just 
query them with Spark (I'd use "earliest" starting offsets in my case).
There is also an interesting interaction between structured streaming and 
regular queries, where each streaming batch recomputes the regular queries it 
depends on. It works with the file sources, I'd like to use that with Kafka 
source as well in some cases.
If would also be great if the external API will be as close to the current one 
({{spark.readStream.format("kafka").option(...)}}) as possible (same options 
etc), maybe just with {{spark.read.kakfa...}}?



> Batch mode SQL source for Kafka
> ---
>
> Key: SPARK-18386
> URL: https://issues.apache.org/jira/browse/SPARK-18386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Cody Koeninger
>
> An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for 
> querying over a defined batch of offsets.
> The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X 
> to timestamp Y) should be taken into account, even if not available in the 
> initial implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org