Re: Is it safe to cache the value of a singleton view (with a global window) in a DoFn?

2019-05-07 Thread Lukasz Cwik
Keep your code simple and rely on the runner caching the value locally so
it should be very cheap to access. If you have a performance issue due to a
runner lacking caching, it would be good to hear about it so we could file
a JIRA about it.

On Mon, May 6, 2019 at 4:24 PM Kenneth Knowles  wrote:

> A singleton view in the global window and no triggering does have just a
> single immutable value. (It really ought to have an updated value in the
> presence of triggers, but I believe instead you will receive a crash. I
> haven't tested.)
>
> In general, a side input yields one value per window. Dataflow in batch
> will already do what you describe, but works with all windows. Dataflow in
> streaming has some caching but if you see a problem that is interesting
> information.
>
> Kenn
>
> On Sat, May 4, 2019 at 9:19 AM Steve Niemitz  wrote:
>
>> I have a singleton view in a global window that is read from a DoFn.  I'm
>> curious if its "correct" to cache that value from the view, or if I need to
>> read it every time.
>>
>> As a (simplified) example, if I were to generate the view as such:
>>
>> input.getPipeline
>>   .apply(Create.of(Collections.singleton[Void](null)))
>>   .apply(MapElements.via(new SimpleFunction[Void, JLong]() {
>> override def apply(input: Void): JLong = {
>>   Instant.now().getMillis
>> }
>>   })).apply(View.asSingleton[JLong]())
>>
>> and then read it from a DoFn (using context.sideInput), is it guaranteed
>> that:
>> - every instance of the DoFn will read the same value?
>> - The value will never change?
>>
>> If so it seems like it'd be safe to cache the value inside the DoFn.  It
>> seems like this would be the case, but I've also seen cases in dataflow
>> where the UI indicates that the MapElements step above produced more than
>> one element, so I'm curious what people have to say.
>>
>> Thanks!
>>
>


Re: Is it safe to cache the value of a singleton view (with a global window) in a DoFn?

2019-05-06 Thread Kenneth Knowles
A singleton view in the global window and no triggering does have just a
single immutable value. (It really ought to have an updated value in the
presence of triggers, but I believe instead you will receive a crash. I
haven't tested.)

In general, a side input yields one value per window. Dataflow in batch
will already do what you describe, but works with all windows. Dataflow in
streaming has some caching but if you see a problem that is interesting
information.

Kenn

On Sat, May 4, 2019 at 9:19 AM Steve Niemitz  wrote:

> I have a singleton view in a global window that is read from a DoFn.  I'm
> curious if its "correct" to cache that value from the view, or if I need to
> read it every time.
>
> As a (simplified) example, if I were to generate the view as such:
>
> input.getPipeline
>   .apply(Create.of(Collections.singleton[Void](null)))
>   .apply(MapElements.via(new SimpleFunction[Void, JLong]() {
> override def apply(input: Void): JLong = {
>   Instant.now().getMillis
> }
>   })).apply(View.asSingleton[JLong]())
>
> and then read it from a DoFn (using context.sideInput), is it guaranteed
> that:
> - every instance of the DoFn will read the same value?
> - The value will never change?
>
> If so it seems like it'd be safe to cache the value inside the DoFn.  It
> seems like this would be the case, but I've also seen cases in dataflow
> where the UI indicates that the MapElements step above produced more than
> one element, so I'm curious what people have to say.
>
> Thanks!
>


Is it safe to cache the value of a singleton view (with a global window) in a DoFn?

2019-05-04 Thread Steve Niemitz
I have a singleton view in a global window that is read from a DoFn.  I'm
curious if its "correct" to cache that value from the view, or if I need to
read it every time.

As a (simplified) example, if I were to generate the view as such:

input.getPipeline
  .apply(Create.of(Collections.singleton[Void](null)))
  .apply(MapElements.via(new SimpleFunction[Void, JLong]() {
override def apply(input: Void): JLong = {
  Instant.now().getMillis
}
  })).apply(View.asSingleton[JLong]())

and then read it from a DoFn (using context.sideInput), is it guaranteed
that:
- every instance of the DoFn will read the same value?
- The value will never change?

If so it seems like it'd be safe to cache the value inside the DoFn.  It
seems like this would be the case, but I've also seen cases in dataflow
where the UI indicates that the MapElements step above produced more than
one element, so I'm curious what people have to say.

Thanks!