[jira] [Commented] (KAFKA-7432) API Method on Kafka Streams for processing chunks/batches of data

Richard Yu (JIRA) Sun, 16 Dec 2018 18:04:48 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722635#comment-16722635
 ]


Richard Yu commented on KAFKA-7432:
-----------------------------------

Hi, just want to point out something here.

What Kafka currently supports is continuous processing, which Spark Streaming 
most recently implemented. In contrast, what this ticket is suggesting to 
implement is microbatch processing in which data is sent in batches.  In some 
data streaming circles, continuous processing is considered the best option for 
sending data. Microbatching was an older technique. 

I don't know if we need to implement this particular option, especially since 
latency overall for microbatching is higher than continuous processing.

> API Method on Kafka Streams for processing chunks/batches of data
> -----------------------------------------------------------------
>
>                 Key: KAFKA-7432
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7432
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: sam
>            Priority: Major
>
> For many situations in Big Data it is preferable to work with a small buffer 
> of records at a go, rather than one record at a time.
> The natural example is calling some external API that supports batching for 
> efficiency.
> How can we do this in Kafka Streams? I cannot find anything in the API that 
> looks like what I want.
> So far I have:
> {{builder.stream[String, String]("my-input-topic") 
> .mapValues(externalApiCall).to("my-output-topic")}}
> What I want is:
> {{builder.stream[String, String]("my-input-topic") .batched(chunkSize = 
> 2000).map(externalBatchedApiCall).to("my-output-topic")}}
> In Scala and Akka Streams the function is called {{grouped}} or {{batch}}. In 
> Spark Structured Streaming we can do 
> {{mapPartitions.map(_.grouped(2000).map(externalBatchedApiCall))}}.
>  
>  
> https://stackoverflow.com/questions/52366623/how-to-process-data-in-chunks-batches-with-kafka-streams



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7432) API Method on Kafka Streams for processing chunks/batches of data

Reply via email to