Hi,
+1 to what Kenn asked: your pipeline is in streaming mode and GIB
preserves windowing, the elements are buffered until one of these
conditions are true: batchsize reached or end of window. I your case I
think it is the second one.
Best
Etienne
On 28/02/2020 19:15, Kenneth Knowles wrote:
What are the timestamps on the elements?
On Fri, Feb 28, 2020 at 8:36 AM Vasu Gupta <dev.vasugu...@gmail.com
<mailto:dev.vasugu...@gmail.com>> wrote:
Edit: Issue is on Direct Runner(Not Direction Runner - mistyped)
Issue Details:
Input data: 7 key-value Packets like: a-1, a-4, b-3, c-5, d-1,
e-4, e-5
Batch Size: 5
Expected output: a-1,4, b-3, c-5, d-1, e-4,5
Getting Packets with irregular size like a-1, b-5, e-4,5 OR a-1,4,
c-5 etc
But i always got correct number of packets with BATCH_SIZE = 1
On 2020/02/27 20:40:16, Kenneth Knowles <k...@apache.org
<mailto:k...@apache.org>> wrote:
> Can you share some more details? What is the expected output and
what
> output are you seeing?
>
> On Thu, Feb 27, 2020 at 9:39 AM Vasu Gupta
<dev.vasugu...@gmail.com <mailto:dev.vasugu...@gmail.com>> wrote:
>
> > Hey folks, I am using Apache beam Framework in Java with
Direction Runner
> > for local testing purposes. When using GroupIntoBatches with
batch size 1
> > it works perfectly fine i.e. the output of the transform is
consistent and
> > as expected. But when using with batch size > 1 the output
Pcollection has
> > less data than it should be.
> >
> > Pipeline flow:
> > 1. A Transform for reading from pubsub
> > 2. Transform for making a KV out of the data
> > 3. A Fixed Window transform of 1 second
> > 4. Applying GroupIntoBatches transform
> > 5. And last, Logging the resulting Iterables.
> >
> > Weird thing is that it batch_size > 1 works great when running on
> > DataflowRunner but not with DirectRunner. I think the issue
might be with
> > Timer Expiry since GroupIntoBatches uses BagState internally.
> >
> > Any help will be much appreciated.
> >
>