Hi Community,
I am trying to support ORDER BY in BeamSQL (currently in global window
only, see BEAM-5064). In order to do so, I need to sort PCollection.
The scale of dataset that ORDER BY works on is unknown. It might be up to
TB sized dataset if BeamSQL runs on some benchmarks. But in the most c
IMO, going with SortValues is the right way to go. The idea is that runners
can always replace the SortValues PTransform with their own optimized
variant. As you have already pointed out, the default inmemory
implementation has strict limitations.
I would suggest going with the inmemory version to
Thanks!
I will try to figure why GSC is disallowed and share it if I find anything.
-Rui
On Tue, Aug 7, 2018 at 12:00 PM Lukasz Cwik wrote:
> IMO, going with SortValues is the right way to go. The idea is that
> runners can always replace the SortValues PTransform with their own
> optimized v