Ning Kang created BEAM-11906:
--------------------------------

             Summary: No trigger early repeatedly for session windows
                 Key: BEAM-11906
                 URL: https://issues.apache.org/jira/browse/BEAM-11906
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow
    Affects Versions: 2.28.0, 2.23.0
            Reporter: Ning Kang


Originated from: 
https://stackoverflow.com/questions/66381608/apache-beam-does-not-trigger-early-repeatedly-for-session-windows-on-google-data

The following pipeline fires early after each element when running locally 
using DirectRunner, but there are no early triggers when running on google 
cloud dataflow. On dataflow it triggers only after the session window has 
closed.

{code:python}
( p
        | 'read'   >> beam.io.ReadFromPubSub(subscription = 
'projects/xxx/subscriptions/xxx-sub')
        | 'json'   >> beam.Map(lambda x: json.loads(x.decode('utf-8')))
        | 'kv'     >> beam.Map(lambda x: (x['id'], x['amount']))
        | 'window' >> beam.WindowInto(window.Sessions(15*60), 
trigger=trigger.Repeatedly(trigger.AfterCount(1)), 
accumulation_mode=AccumulationMode.ACCUMULATING)
        | 'group'  >> beam.GroupByKey()
        | 'log'    >> beam.Map(lambda x: logging.info(x))
)
{code}

Apache Beam versions tried: 2.23 and 2.28.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to