I like the idea of a/many very simple example(s) of side inputs. There are existing examples that use side inputs:
$ cd examples/java/src/main/java/org/apache/beam/examples $ grep -r withSideInput . ./complete/TfIdf.java: .withSideInputs(totalDocuments)); ./complete/game/GameStats.java: .withSideInputs(globalMeanScore)); ./complete/game/GameStats.java: .withSideInputs(spammersView)) ./cookbook/FilterExamples.java: .withSideInputs(globalMeanTemp)); >From just this grep It looks like all but one are broadcast scalar values. I have not looked at them to see if they are too complex or too trivial. Side inputs _are_ different in streaming as you have to pause the main input or push back elements until a side input is ready for a window. I would suggest multiple simple examples each showing one way of using side inputs. A particular thing to demonstrated might be a triggered Combine.perKey() and tutorial that it requires a View.asMultimap() because triggers result in duplicate entries for a key. Kenn On Wed, Nov 21, 2018 at 4:40 PM Ruoyun Huang <[email protected]> wrote: > Hi, > > I am working on sideInput support in java reference runner (ULR) JIRA-2928 > [1]. > Although there are inline code snippet example [2] and unit tests [3], I > did not find > a good place showing a working example of SideInput(please correct me if I > am wrong). > I am thinking of creating one more WordCount example under example folder > [2]. > In particular, in this example we show variants of a) sideinputs as a > scalar AND multimap, b) from pipeline data or created within code and c) > [OPTIONAL?] Streaming versus batch, if there are differences (this I am not > sure yet). > > In the meanwhile, JIRA-2928 can also easily rely on such an example to > validate behaviors between portable/non-portable runners. > > Would like to double check if is this a reasonable idea. > > Even though SideInput is just one of our many many features, my > justification is that, it is commonly used, thus having a one-stop example > make it easier for new users. That being said, is there a reason not to > have yet another WordCount example? (Another idea is to extend existing > WordCount.java, but that breaks its simplicity.) > > If it is a good change to have, any suggestion on what else to include? > > Thanks! > > [1] https://issues.apache.org/jira/browse/BEAM-2928 > [2] > sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L160 > [3] > sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/MultimapSideInputTest.java > [4] examples/java/src/main/java/org/apache/beam/examples > > -- > ================ > Ruoyun Huang > >
