Hi,
Agreed with the others that this does not sound like a good fit...
But to explore ideas... One possible (complicated and error prone) way this
could be done, ...
Beam does not support cycles, but you could use an external unbounded
source as a way of sending impulse out and then back into th
Interesting! I agree with Luke that it seems not a great fit for Beam in
the most rigorous sense. There are many considerations:
1. We assume ParDo has side effects by default. So the model actual
*requires* eager evaluation, not lazy, in order to make all the side
effects happen. But for your cas
You shouldn't need to call it before running the pipeline as you are doing
(you can if you want but its not necessary).
Have you created a service META-INF entry for the JvmInitializer you have
created or are using @AutoService?
This is the relevant bit of the documentation[1]. Here is some good d
Hello, all. I still have not been given the tasking to convert my work
project to use Beam, but it is still something that I am looking to do in
the fairly near future. Our data workflow consists of ingest and
transformation, and I was hoping that there are ETL frameworks that work
well with Beam
Looking at the naive solution
PCollectionView agg = input
.apply(Windows.sliding(10mins, 1sec
hops).trigger(Repeatedly.forever(AfterPane.elementCountAtLeast(1
.apply(Mean.globally())
.apply(View.asSingleton());
PCollection output = input
.apply(ParDo.of(new Joiner().withSi
" input elements can pass through the Joiner DoFn before the sideInput
corresponding to that element is present"
I don't think this is correct. Runners will evaluate a DoFn with side
inputs on elements in a given window only after all side inputs are ready
(have triggered at least once) in this wi
With Stateful DoFn, each instance of DoFn will have elements which belong
to the same window and have the same key. So, the parallelism is limited by
[no. of keys * no. of Windows]
In batch mode, as all the elements belong to the same window, i.e. Global
Window, the parallelism will be limited by t
This doesn't seem like a good fit for Apache Beam but have you tried:
* using a StatefulDoFn that performs all the joining and signals the
service powering the sources to stop sending data once your criteria is met
(most services powering these sources won't have a way to be controlled
this way)?
*
Hello,
We have a use case and it's not clear how it can be solved/implemented with
Beam. I count on community help with this, maybe I miss something that lays on
the surface.
Let’s say, there are two different bounded sources and one join transform (say
GBK) downstream. This Join transform is
Hi Rahul,
Thanks for the response.
I did consider State, but actually I was tentative because of a different
requirement that I didn't specify - the same pipeline should work for batch
and stream modes. I'm not sure how Stateful DoFn's behave in the batch
world: can you get Beam to pass the eleme
Hi Sam,
(Assuming all the tuples have the same key) One solution could be to use
ParDo with State(to calculate mean) => For each element as they occur,
calculate the Mean(store the sum and count as the state) and emit the tuple
with the new average value.
But it will limit the parallelism count.
My team and I have been puzzling for a while how to solve a specific
problem.
Say you have an input stream of tuples:
And you want to output a stream containing:
Where the average is an aggregation over a 10 minute sliding window of the
"value" field.
There are a couple of extra require
12 matches
Mail list logo