Re: Time-based frequency table at scale

2020-03-11 Thread Nicolas Paris
Hi, did you try exploding the arrays, then doing the aggregation/count and at the end applying a udf to add the 0 values ? my experience is working on arrays is usually a bad idea. sakag writes: > Hi all, > > We have a rather interesting use case, and are struggling to come up with an >

Re: Time-based frequency table at scale

2020-03-11 Thread Enrico Minack
An interesting puzzle indeed. What is your measure of "that scales"? Does not fail, does not spill, does not need a huge amount of memory / disk, is O(N), processes X records per second and core? Enrico Am 11.03.20 um 16:59 schrieb sakag: Hi all, We have a rather interesting use case,

Time-based frequency table at scale

2020-03-11 Thread sakag
Hi all, We have a rather interesting use case, and are struggling to come up with an approach that scales. Reaching out to seek your expert opinion/feedback and tips. What we are trying to do is to find the count of numerical ids over a sliding time window where each of our data records has