Re: [DISCUSS][C++] Refactoring of Expression simplification passes
This seems like it could be a premature optimization, do we know what fraction of important workloads are taken up by this operation? On Wed, May 5, 2021 at 12:35 PM Benjamin Kietzman wrote: > > Sorry, yes: I meant 4 microseconds and not 4 milliseconds. > > On Wed, May 5, 2021 at 1:27 PM Antoine Pitrou wrote: > > > On Wed, 5 May 2021 13:23:36 -0400 > > Benjamin Kietzman wrote: > > > Currently, Expressions (used to specify dataset filters and projections) > > > are simplified by direct rewriting: a filter such as `alpha == 2 and > > beta > > > > 3` > > > on a partition where we are guaranteed that `beta == 5` will be rewritten > > > to `alpha == 2` before evaluation against scanned batches. This can > > > potentially occur for each scanned batch: for example, Parquet's row > > group > > > statistics are used in the same way to simplify filters. > > > > > > Rewriting is not extremely expensive (a microbenchmark estimate on > > > my machine shows that a simple case such as the above takes 4ms). > > > > 4ms for a single rewriting actually sounds quite large to me. > > (or did you mean 4µs?) > > > > > > > >
Re: [DISCUSS][C++] Refactoring of Expression simplification passes
Sorry, yes: I meant 4 microseconds and not 4 milliseconds. On Wed, May 5, 2021 at 1:27 PM Antoine Pitrou wrote: > On Wed, 5 May 2021 13:23:36 -0400 > Benjamin Kietzman wrote: > > Currently, Expressions (used to specify dataset filters and projections) > > are simplified by direct rewriting: a filter such as `alpha == 2 and > beta > > > 3` > > on a partition where we are guaranteed that `beta == 5` will be rewritten > > to `alpha == 2` before evaluation against scanned batches. This can > > potentially occur for each scanned batch: for example, Parquet's row > group > > statistics are used in the same way to simplify filters. > > > > Rewriting is not extremely expensive (a microbenchmark estimate on > > my machine shows that a simple case such as the above takes 4ms). > > 4ms for a single rewriting actually sounds quite large to me. > (or did you mean 4µs?) > > > >
Re: [DISCUSS][C++] Refactoring of Expression simplification passes
On Wed, 5 May 2021 13:23:36 -0400 Benjamin Kietzman wrote: > Currently, Expressions (used to specify dataset filters and projections) > are simplified by direct rewriting: a filter such as `alpha == 2 and beta > > 3` > on a partition where we are guaranteed that `beta == 5` will be rewritten > to `alpha == 2` before evaluation against scanned batches. This can > potentially occur for each scanned batch: for example, Parquet's row group > statistics are used in the same way to simplify filters. > > Rewriting is not extremely expensive (a microbenchmark estimate on > my machine shows that a simple case such as the above takes 4ms). 4ms for a single rewriting actually sounds quite large to me. (or did you mean 4µs?)