Re: Kafka Streams app process records until certain date

2021-12-08 Thread Matthias J. Sax

Hard to achieve.

I guess a naive approach would be to use a `flatMapTransform()` to 
implement a filter that drops all record that are not in the desired 
time range.


pause() and resume() are not available in Kafka Streams, but only on the 
KafkaConsumer (The Spring docs you cite is also about the consumer, not 
Kafka Streams).



-Matthias

On 11/24/21 11:05 AM, Miguel González wrote:

Hello

For my use case I need to work with a chuck of records, let's say per
month... We have over two years of data... and we are testing if we can
deploy it to production, but we need to test in small batches.

I have built a Kafka Streams app that processes two input topics and output
to one topic.

I would like to process the first two months of data. Is that possible?

- I have tried blocking the consumer thread using .map and comparing the
timestamp on the message and a timestamp I get from another system that
would tell me until what time I should process on the two KStreams I have
but I have noticed.I also increased MAX_POLL_INTERVAL_MS_CONFIG but I have
noticed the messages that are in range do not get processed and sent to the
output topic.
- I have also seen a Spring Cloud library apparently offer a
pause-resume feature.

https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.5/reference/html/spring-cloud-stream-binder-kafka.html#_binding_visualization_and_control_in_kafka_streams_binder
- I have also seen that implementing a transformer or processor could
work but in this case the state store would possible less than years of
data. That is something I would like to avoid.


Any help is appreciated.

regards
- Miguel



Kafka Streams app process records until certain date

2021-11-24 Thread Miguel González
Hello

For my use case I need to work with a chuck of records, let's say per
month... We have over two years of data... and we are testing if we can
deploy it to production, but we need to test in small batches.

I have built a Kafka Streams app that processes two input topics and output
to one topic.

I would like to process the first two months of data. Is that possible?

   - I have tried blocking the consumer thread using .map and comparing the
   timestamp on the message and a timestamp I get from another system that
   would tell me until what time I should process on the two KStreams I have
   but I have noticed.I also increased MAX_POLL_INTERVAL_MS_CONFIG but I have
   noticed the messages that are in range do not get processed and sent to the
   output topic.
   - I have also seen a Spring Cloud library apparently offer a
   pause-resume feature.
   
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.5/reference/html/spring-cloud-stream-binder-kafka.html#_binding_visualization_and_control_in_kafka_streams_binder
   - I have also seen that implementing a transformer or processor could
   work but in this case the state store would possible less than years of
   data. That is something I would like to avoid.


Any help is appreciated.

regards
- Miguel