Jimexist commented on a change in pull request #375: URL: https://github.com/apache/arrow-datafusion/pull/375#discussion_r638359608
########## File path: datafusion/src/physical_plan/mod.rs ########## @@ -509,6 +540,43 @@ pub trait Accumulator: Send + Sync + Debug { fn evaluate(&self) -> Result<ScalarValue>; } +/// A window accumulator represents a stateful object that lives throughout the evaluation of multiple +/// rows and generically accumulates values. +/// +/// An accumulator knows how to: +/// * update its state from inputs via `update` +/// * convert its internal state to a vector of scalar values +/// * update its state from multiple accumulators' states via `merge` +/// * compute the final value from its internal state via `evaluate` +pub trait WindowAccumulator: Send + Sync + Debug { + /// scans the accumulator's state from a vector of scalars, similar to Accumulator it also + /// optionally generates values. + fn scan(&mut self, values: &[ScalarValue]) -> Result<Option<ScalarValue>>; Review comment: @alamb good question! The window word in this trait is purely indicating the fact that window functions will use this. it can be of a better name but... for a design, there are two complications: 1. multiple window functions, each having different window frames, can be scanning batches at the same time, so i'd want each to create its own window accumulator, remembering to push_back, and remove front, on their own; this trait would not put that into the API, it just _scans_ 2. specifically for window that peeks ahead, because batches arrive in async stream, it is not feasible to _peek_, so my plan is to allow them to optionally execute a one time shift upon finishing up Due to 1 and 2, a best possible state vector for window accumulator is possibly `VecDequeue`. And the name `scan` is because accumulators iterate the list and either optionally produce one value at a time (`max` with order by), or exactly one accumulated value upon finish scanning (`max` with empty over), but not both -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org