Hi everyone, A graph is a great idea! I made one, but I'm not sure it's clear and I'm sure someone can make a better one. My attempt is here: https://github.com/eyala/datafu/blob/blog/site/source/blog/collectnumberedordedelements.png
Ohad, what do you think? Regarding the sort - in the original code in the post - getting the list of elements - the second sort isn't redundant, because the order is lost after the window function runs. But in the example whose runtime I compared - the count of the number of elements - the sort is redundant. In retrospect, I think maybe I should remove the count as an example and continue using the generation of the list as the main example and show runtimes for that. Does that make sense? Eyal On Sun, Jan 25, 2026 at 10:43 PM Ohad Raviv <[email protected]> wrote: > Hi! > Nice technical post. > Similar trick we use in a few other functions as well, if I'm not mistaken > (like count-distinct-up-to). > I think there's a redundant sort in the window function example. > Maybe a graph would show the data better than the table. > > Ohad. > > On Wed, Jan 21, 2026, 13:33 Eyal Allweil <[email protected]> wrote: > > > Alon, thank you for your comment, I've added it to the draft. I also > added > > a diagram of how the code runs - the latest version is in the same GitHub > > link here: > > > https://github.com/eyala/datafu/blob/blog/site/source/blog/publish-date-here-collectNumberOrderedElements.markdown > > > > Question - do you think this sentence is good for the final paragraph? > > > > Even if it isn't useful to you today, the basic technique - using > > DeclarativeAggregate to allow Spark to optimize more effectively - may > be. > > If you've done something similar, or created any useful general-purpose > API > > in Spark, don't hesitate to contribute it to DataFu! We are always glad > to > > review new contributions. > > > > Eyal > > > > On 2026/01/15 09:10:13 Alon Hartanu wrote: > > > Hi everyone, > > > > > > I read the blog, it looks great. > > > > > > I think you can also add about possible memory overflow this function > can > > > help prevent, when using collect_list on large data. > > > > > > I have a use case for this function in one of my applications, I'll try > > it > > > out in a few weeks and let you know how it goes. > > > > > > Thanks, Alon > > > > > >
