Hi! Nice technical post. Similar trick we use in a few other functions as well, if I'm not mistaken (like count-distinct-up-to). I think there's a redundant sort in the window function example. Maybe a graph would show the data better than the table.
Ohad. On Wed, Jan 21, 2026, 13:33 Eyal Allweil <[email protected]> wrote: > Alon, thank you for your comment, I've added it to the draft. I also added > a diagram of how the code runs - the latest version is in the same GitHub > link here: > https://github.com/eyala/datafu/blob/blog/site/source/blog/publish-date-here-collectNumberOrderedElements.markdown > > Question - do you think this sentence is good for the final paragraph? > > Even if it isn't useful to you today, the basic technique - using > DeclarativeAggregate to allow Spark to optimize more effectively - may be. > If you've done something similar, or created any useful general-purpose API > in Spark, don't hesitate to contribute it to DataFu! We are always glad to > review new contributions. > > Eyal > > On 2026/01/15 09:10:13 Alon Hartanu wrote: > > Hi everyone, > > > > I read the blog, it looks great. > > > > I think you can also add about possible memory overflow this function can > > help prevent, when using collect_list on large data. > > > > I have a use case for this function in one of my applications, I'll try > it > > out in a few weeks and let you know how it goes. > > > > Thanks, Alon > > >
