Examples are good for showing users how to use certain concepts but we
should stick with ValidatesRunner tests for ensuring that runners / SDKs
implement concepts correctly. We have several ValidatesRunner side input
tests in ParDoTest.java[1], ViewTest.java[2], and sideinputs_test.py[3]
that could be used to validate the ULR.

1:
https://github.com/apache/beam/blob/658630d8f75281f50dc434f2e062e2819ebeff84/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L710
2:
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java
3:
https://github.com/apache/beam/blob/70259e13e83ce3e2bb8d9ee9731e09f9ff456b05/sdks/python/apache_beam/transforms/sideinputs_test.py

On Wed, Nov 21, 2018 at 7:50 PM Kenneth Knowles <[email protected]> wrote:

> I like the idea of a/many very simple example(s) of side inputs. There are
> existing examples that use side inputs:
>
> $ cd examples/java/src/main/java/org/apache/beam/examples
> $ grep -r withSideInput .
> ./complete/TfIdf.java:                  .withSideInputs(totalDocuments));
> ./complete/game/GameStats.java:
> .withSideInputs(globalMeanScore));
> ./complete/game/GameStats.java:
> .withSideInputs(spammersView))
> ./cookbook/FilterExamples.java:
> .withSideInputs(globalMeanTemp));
>
> From just this grep It looks like all but one are broadcast scalar values.
> I have not looked at them to see if they are too complex or too trivial.
>
> Side inputs _are_ different in streaming as you have to pause the main
> input or push back elements until a side input is ready for a window.
>
> I would suggest multiple simple examples each showing one way of using
> side inputs. A particular thing to demonstrated might be a triggered
> Combine.perKey() and tutorial that it requires a View.asMultimap() because
> triggers result in duplicate entries for a key.
>
> Kenn
>
> On Wed, Nov 21, 2018 at 4:40 PM Ruoyun Huang <[email protected]> wrote:
>
>> Hi,
>>
>> I am working on sideInput support in java reference runner (ULR)
>> JIRA-2928 [1].
>> Although there are inline code snippet example [2] and unit tests [3], I
>> did not find
>> a good place showing a working example of SideInput(please correct me if
>> I am wrong).
>> I am thinking of creating one more WordCount example under example folder
>> [2].
>> In particular, in this example we show variants of a) sideinputs as a
>> scalar AND multimap, b) from pipeline data or created within code and c)
>> [OPTIONAL?] Streaming versus batch, if there are differences (this I am not
>> sure yet).
>>
>> In the meanwhile, JIRA-2928 can also easily rely on such an example to
>> validate behaviors between portable/non-portable runners.
>>
>> Would like to double check if is this a reasonable idea.
>>
>> Even though SideInput is just one of our many many features, my
>> justification is that, it is commonly used, thus having a one-stop example
>> make it easier for new users.  That being said, is there a reason not to
>> have yet another WordCount example? (Another idea is to extend existing
>> WordCount.java, but that breaks its simplicity.)
>>
>> If it is a good change to have, any suggestion on what else to include?
>>
>> Thanks!
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-2928
>> [2]
>> sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L160
>> [3]
>> sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java
>> [4] examples/java/src/main/java/org/apache/beam/examples
>>
>> --
>> ================
>> Ruoyun  Huang
>>
>>

Reply via email to