[Go SDK] Apache Beam GroupByKey

Anand Singh Kunwar Mon, 03 Feb 2020 09:14:32 -0800

Hi
I have been trying out the apache beam go SDK for running aggregation on
logs from kafka. I have been trying to write a kafka reader using ParDoFns
and aggregating the result using WindowInto and GroupByKey. However the
`GroupByKey`'s output only outputs result if I finish my kafka message
outputs. I am using the direct runner for testing these out. It seems like
a bug in the runner/GroupByKey implementation as it seems to assume that
the output is going to exhaust even in the case of windowed input.


Pipeline representation:
impulse -> kafka consumption ParDo(consumes kafka messages in a loop and
emits metric with beam.EventTime) -> add timestamp ParDo ->
WindowInto(fixed windows) -> Add window max Timestamp as key Pardo ->
*GroupByKey*

The *GroupByKey *gets stuck until first kafka consumption pardo exits. This
doesn't seem desired. I haven't been unable to pinpoint the exact line of
code why this gets stuck. Can someone help me out with this.

Thanks for your help.

Best
Anand Singh Kunwar

[Go SDK] Apache Beam GroupByKey

Reply via email to