SortValues does not have a defined & documented URN yet. Once a Runner is
providing such an override, it will happen. No runner publicly provides one
to my knowledge.

On Wed, May 30, 2018 at 8:08 AM Kenneth Knowles <k...@google.com> wrote:

> I can see a few usability issues here. Totally agree w/ Luke, just noting:
>
>  - The naming is slightly misleading because SortValues is actually
> already GBK+SortValues.
>  - It also makes things look less supported when they are in the
> extensions/ folder. I'd say we should have a better place to put such a
> library if it is the official public implementation. The word "extensions"
> doesn't seem particularly accurate or meaningful to me.
>
> Q: Does SortValues have a defined & documented URN yet?
>
> Kenn
>
> On Wed, May 30, 2018 at 7:52 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> Each runner can choose to override the SortValues PTransform with their
>> own internal offering. For example Spark overrides global combine[1] during
>> pipeline translation. If Spark detected the SortValues PTransform during
>> translation, it could override the offering with something that used
>> repartitionAndSortWithinPartitions.
>>
>> GroupByKeyAndSortValuesOnly inside Dataflow exists to support a specific
>> use case. Users should rely on SortValues as it is the public
>> implementation for sorting.
>>
>> 1:
>> https://github.com/apache/beam/blob/85dcab56268fbac923ffd5885489ee154f097fc5/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L200
>>
>> As a side note, its uncommon where you need to sort all values, usually
>> top 100 suffices and can be implemented much more efficiently with a
>> combiner when compared to sorting.
>>
>> On Wed, May 30, 2018 at 3:38 AM <marek-simu...@seznam.cz> wrote:
>>
>>> Hi,
>>>  I have question I am trying to do translation in dsl-euphoria for
>>> “GroupByKey with sorted values within key” to Beam. I am aware of java sdk
>>> extensions SortValues, but it doesn’t have sufficient abstraction for
>>> runners.
>>>
>>> I noticed that in DataflowRunner there is translation of batch
>>> GroupByKey to GroupByKeyAndSortValuesOnly but is it considered to have it
>>> in beam core so for example SparkRunner could translate “GroupByKey with
>>> sorted values within key” with their internals such as
>>> repartitionAndSortWithinPartitions.
>>>
>>> Thank you.
>>> Marek Simunek
>>>
>>

Reply via email to