I can see a few usability issues here. Totally agree w/ Luke, just noting: - The naming is slightly misleading because SortValues is actually already GBK+SortValues. - It also makes things look less supported when they are in the extensions/ folder. I'd say we should have a better place to put such a library if it is the official public implementation. The word "extensions" doesn't seem particularly accurate or meaningful to me.
Q: Does SortValues have a defined & documented URN yet? Kenn On Wed, May 30, 2018 at 7:52 AM Lukasz Cwik <[email protected]> wrote: > Each runner can choose to override the SortValues PTransform with their > own internal offering. For example Spark overrides global combine[1] during > pipeline translation. If Spark detected the SortValues PTransform during > translation, it could override the offering with something that used > repartitionAndSortWithinPartitions. > > GroupByKeyAndSortValuesOnly inside Dataflow exists to support a specific > use case. Users should rely on SortValues as it is the public > implementation for sorting. > > 1: > https://github.com/apache/beam/blob/85dcab56268fbac923ffd5885489ee154f097fc5/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L200 > > As a side note, its uncommon where you need to sort all values, usually > top 100 suffices and can be implemented much more efficiently with a > combiner when compared to sorting. > > On Wed, May 30, 2018 at 3:38 AM <[email protected]> wrote: > >> Hi, >> I have question I am trying to do translation in dsl-euphoria for >> “GroupByKey with sorted values within key” to Beam. I am aware of java sdk >> extensions SortValues, but it doesn’t have sufficient abstraction for >> runners. >> >> I noticed that in DataflowRunner there is translation of batch GroupByKey >> to GroupByKeyAndSortValuesOnly but is it considered to have it in beam core >> so for example SparkRunner could translate “GroupByKey with sorted values >> within key” with their internals such as repartitionAndSortWithinPartitions. >> >> Thank you. >> Marek Simunek >> >
