Thanks Kenneth. Didn't look into subfolders, let me read a bit more. And will look into the tests Luke pointed out as well.
To make sure I understand your comments of "Side inputs _are_ different in streaming as *you* have to ...", are you saying either: 1) a user needs to use/treat SideInput API differently when handling streaming case, OR 2) Beam developments had to do the underlying implementations differently? On Wed, Nov 21, 2018 at 7:50 PM Kenneth Knowles <[email protected]> wrote: > I like the idea of a/many very simple example(s) of side inputs. There are > existing examples that use side inputs: > > $ cd examples/java/src/main/java/org/apache/beam/examples > $ grep -r withSideInput . > ./complete/TfIdf.java: .withSideInputs(totalDocuments)); > ./complete/game/GameStats.java: > .withSideInputs(globalMeanScore)); > ./complete/game/GameStats.java: > .withSideInputs(spammersView)) > ./cookbook/FilterExamples.java: > .withSideInputs(globalMeanTemp)); > > From just this grep It looks like all but one are broadcast scalar values. > I have not looked at them to see if they are too complex or too trivial. > > Side inputs _are_ different in streaming as you have to pause the main > input or push back elements until a side input is ready for a window. > > I would suggest multiple simple examples each showing one way of using > side inputs. A particular thing to demonstrated might be a triggered > Combine.perKey() and tutorial that it requires a View.asMultimap() because > triggers result in duplicate entries for a key. > > Kenn > > On Wed, Nov 21, 2018 at 4:40 PM Ruoyun Huang <[email protected]> wrote: > >> Hi, >> >> I am working on sideInput support in java reference runner (ULR) >> JIRA-2928 [1]. >> Although there are inline code snippet example [2] and unit tests [3], I >> did not find >> a good place showing a working example of SideInput(please correct me if >> I am wrong). >> I am thinking of creating one more WordCount example under example folder >> [2]. >> In particular, in this example we show variants of a) sideinputs as a >> scalar AND multimap, b) from pipeline data or created within code and c) >> [OPTIONAL?] Streaming versus batch, if there are differences (this I am not >> sure yet). >> >> In the meanwhile, JIRA-2928 can also easily rely on such an example to >> validate behaviors between portable/non-portable runners. >> >> Would like to double check if is this a reasonable idea. >> >> Even though SideInput is just one of our many many features, my >> justification is that, it is commonly used, thus having a one-stop example >> make it easier for new users. That being said, is there a reason not to >> have yet another WordCount example? (Another idea is to extend existing >> WordCount.java, but that breaks its simplicity.) >> >> If it is a good change to have, any suggestion on what else to include? >> >> Thanks! >> >> [1] https://issues.apache.org/jira/browse/BEAM-2928 >> [2] >> sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L160 >> [3] >> sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java >> [4] examples/java/src/main/java/org/apache/beam/examples >> >> -- >> ================ >> Ruoyun Huang >> >> -- ================ Ruoyun Huang
