Kenneth Knowles created BEAM-5690: ------------------------------------- Summary: Issue with GroupByKey in BeamSql using SparkRunner Key: BEAM-5690 URL: https://issues.apache.org/jira/browse/BEAM-5690 Project: Beam Issue Type: Task Components: runner-spark Reporter: Kenneth Knowles Assignee: Amit Sela
Reported on user@ {quote}We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window). Below is the pipeline: KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql ---> KafkaSink (KafkaIO) We are using Spark Runner for this. The BeamSql query is: {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} We are grouping by Col3 which is a string. It can hold values string[0-9]. The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected. Below is the output observed: (WST and WET are indicators for window start time and window end time) {code} {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0000 +0000"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 +0000","WET":"2018-10-09 09-56-00 0} {code} {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)