Re: [DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Wes McKinney
This seems like it could be a premature optimization, do we know what
fraction of important workloads are taken up by this operation?

On Wed, May 5, 2021 at 12:35 PM Benjamin Kietzman  wrote:
>
> Sorry, yes: I meant 4 microseconds and not 4 milliseconds.
>
> On Wed, May 5, 2021 at 1:27 PM Antoine Pitrou  wrote:
>
> > On Wed, 5 May 2021 13:23:36 -0400
> > Benjamin Kietzman  wrote:
> > > Currently, Expressions (used to specify dataset filters and projections)
> > > are simplified by direct rewriting: a filter such as `alpha == 2 and
> > beta >
> > > 3`
> > > on a partition where we are guaranteed that `beta == 5` will be rewritten
> > > to `alpha == 2` before evaluation against scanned batches. This can
> > > potentially occur for each scanned batch: for example, Parquet's row
> > group
> > > statistics are used in the same way to simplify filters.
> > >
> > > Rewriting is not extremely expensive (a microbenchmark estimate on
> > > my machine shows that a simple case such as the above takes 4ms).
> >
> > 4ms for a single rewriting actually sounds quite large to me.
> > (or did you mean 4µs?)
> >
> >
> >
> >


Re: [DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Benjamin Kietzman
Sorry, yes: I meant 4 microseconds and not 4 milliseconds.

On Wed, May 5, 2021 at 1:27 PM Antoine Pitrou  wrote:

> On Wed, 5 May 2021 13:23:36 -0400
> Benjamin Kietzman  wrote:
> > Currently, Expressions (used to specify dataset filters and projections)
> > are simplified by direct rewriting: a filter such as `alpha == 2 and
> beta >
> > 3`
> > on a partition where we are guaranteed that `beta == 5` will be rewritten
> > to `alpha == 2` before evaluation against scanned batches. This can
> > potentially occur for each scanned batch: for example, Parquet's row
> group
> > statistics are used in the same way to simplify filters.
> >
> > Rewriting is not extremely expensive (a microbenchmark estimate on
> > my machine shows that a simple case such as the above takes 4ms).
>
> 4ms for a single rewriting actually sounds quite large to me.
> (or did you mean 4µs?)
>
>
>
>


Re: [DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Antoine Pitrou
On Wed, 5 May 2021 13:23:36 -0400
Benjamin Kietzman  wrote:
> Currently, Expressions (used to specify dataset filters and projections)
> are simplified by direct rewriting: a filter such as `alpha == 2 and beta >
> 3`
> on a partition where we are guaranteed that `beta == 5` will be rewritten
> to `alpha == 2` before evaluation against scanned batches. This can
> potentially occur for each scanned batch: for example, Parquet's row group
> statistics are used in the same way to simplify filters.
> 
> Rewriting is not extremely expensive (a microbenchmark estimate on
> my machine shows that a simple case such as the above takes 4ms).

4ms for a single rewriting actually sounds quite large to me.
(or did you mean 4µs?)