[ 
https://issues.apache.org/jira/browse/KAFKA-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722635#comment-16722635
 ] 

Richard Yu edited comment on KAFKA-7432 at 12/17/18 1:59 AM:
-------------------------------------------------------------

Hi, just want to point out something here.

What Kafka currently supports is continuous processing, which Spark Streaming 
most recently implemented. In contrast, what this ticket is suggesting to 
implement is microbatch processing in which data is sent in batches.  In some 
data streaming circles, continuous processing is considered the best option for 
sending data. Microbatching was an older technique. 

I don't know if we need to implement this particular option, especially since 
latency overall for microbatching is higher than continuous processing.

Spark is moving from microbatch processing to continuous largely because of 
latency improvements. So with what Kafka has right now, this ticket probably 
wouldn't be necessary.


was (Author: yohan123):
Hi, just want to point out something here.

What Kafka currently supports is continuous processing, which Spark Streaming 
most recently implemented. In contrast, what this ticket is suggesting to 
implement is microbatch processing in which data is sent in batches.  In some 
data streaming circles, continuous processing is considered the best option for 
sending data. Microbatching was an older technique. 

I don't know if we need to implement this particular option, especially since 
latency overall for microbatching is higher than continuous processing.

Spark is largely moving from microbatch processing to continuous largely 
because of latency improvements. So with what Kafka has right now, this ticket 
probably wouldn't be necessary.

> API Method on Kafka Streams for processing chunks/batches of data
> -----------------------------------------------------------------
>
>                 Key: KAFKA-7432
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7432
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: sam
>            Priority: Major
>
> For many situations in Big Data it is preferable to work with a small buffer 
> of records at a go, rather than one record at a time.
> The natural example is calling some external API that supports batching for 
> efficiency.
> How can we do this in Kafka Streams? I cannot find anything in the API that 
> looks like what I want.
> So far I have:
> {{builder.stream[String, String]("my-input-topic") 
> .mapValues(externalApiCall).to("my-output-topic")}}
> What I want is:
> {{builder.stream[String, String]("my-input-topic") .batched(chunkSize = 
> 2000).map(externalBatchedApiCall).to("my-output-topic")}}
> In Scala and Akka Streams the function is called {{grouped}} or {{batch}}. In 
> Spark Structured Streaming we can do 
> {{mapPartitions.map(_.grouped(2000).map(externalBatchedApiCall))}}.
>  
>  
> https://stackoverflow.com/questions/52366623/how-to-process-data-in-chunks-batches-with-kafka-streams



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to