On Mon, Nov 26, 2018 at 1:32 PM Ruoyun Huang <[email protected]> wrote:

> Thanks Kenneth. Didn't look into subfolders, let me read a bit more.  And
> will look into the tests Luke pointed out as well.
>
> To make sure I understand your comments of "Side inputs _are_ different in
> streaming as *you* have to ...", are you saying either: 1) a user needs
> to use/treat SideInput API differently when handling streaming case, OR 2)
> Beam developments had to do the underlying implementations differently?
>

2) each runner has to do the execution differently

so

1) users might need to know about this, since their main input will wait
for side input data or for the window to expire; in batch the window is
always ready so there is no waiting

Kenn


>
>
>
> On Wed, Nov 21, 2018 at 7:50 PM Kenneth Knowles <[email protected]> wrote:
>
>> I like the idea of a/many very simple example(s) of side inputs. There
>> are existing examples that use side inputs:
>>
>> $ cd examples/java/src/main/java/org/apache/beam/examples
>> $ grep -r withSideInput .
>> ./complete/TfIdf.java:                  .withSideInputs(totalDocuments));
>> ./complete/game/GameStats.java:
>> .withSideInputs(globalMeanScore));
>> ./complete/game/GameStats.java:
>> .withSideInputs(spammersView))
>> ./cookbook/FilterExamples.java:
>> .withSideInputs(globalMeanTemp));
>>
>> From just this grep It looks like all but one are broadcast scalar
>> values. I have not looked at them to see if they are too complex or too
>> trivial.
>>
>> Side inputs _are_ different in streaming as you have to pause the main
>> input or push back elements until a side input is ready for a window.
>>
>> I would suggest multiple simple examples each showing one way of using
>> side inputs. A particular thing to demonstrated might be a triggered
>> Combine.perKey() and tutorial that it requires a View.asMultimap() because
>> triggers result in duplicate entries for a key.
>>
>
>> Kenn
>>
>> On Wed, Nov 21, 2018 at 4:40 PM Ruoyun Huang <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am working on sideInput support in java reference runner (ULR)
>>> JIRA-2928 [1].
>>> Although there are inline code snippet example [2] and unit tests [3], I
>>> did not find
>>> a good place showing a working example of SideInput(please correct me if
>>> I am wrong).
>>> I am thinking of creating one more WordCount example under example
>>> folder [2].
>>> In particular, in this example we show variants of a) sideinputs as a
>>> scalar AND multimap, b) from pipeline data or created within code and c)
>>> [OPTIONAL?] Streaming versus batch, if there are differences (this I am not
>>> sure yet).
>>>
>>> In the meanwhile, JIRA-2928 can also easily rely on such an example to
>>> validate behaviors between portable/non-portable runners.
>>>
>>> Would like to double check if is this a reasonable idea.
>>>
>>> Even though SideInput is just one of our many many features, my
>>> justification is that, it is commonly used, thus having a one-stop example
>>> make it easier for new users.  That being said, is there a reason not to
>>> have yet another WordCount example? (Another idea is to extend existing
>>> WordCount.java, but that breaks its simplicity.)
>>>
>>> If it is a good change to have, any suggestion on what else to include?
>>>
>>> Thanks!
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-2928
>>> [2]
>>> sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L160
>>> [3]
>>> sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java
>>> [4] examples/java/src/main/java/org/apache/beam/examples
>>>
>>> --
>>> ================
>>> Ruoyun  Huang
>>>
>>>
>
> --
> ================
> Ruoyun  Huang
>
>

Reply via email to