[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643682#comment-16643682 ]
Kenneth Knowles commented on BEAM-5690: --------------------------------------- CC [~kedin] [~apilloud] [~xumingming] [~mingmxu] [~amaliujia] Since it is not reproduced in the Flink runner or Direct runner, the SQL implementation of GROUP BY is probably triggering some latent bug in the Spark runner's streaming mode. > Issue with GroupByKey in BeamSql using SparkRunner > -------------------------------------------------- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark > Reporter: Kenneth Knowles > Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > > KafkaSource (KafkaIO) > ---> Windowing (FixedWindow 1min) > ---> BeamSql > ---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0000 +0000"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 0000 > +0000","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)