That makes sense! I can see how masked reads are useful in that kind of approach too. Thanks for the explanation.
On Fri, Jun 2, 2023, 8:45 AM Will Jones <will.jones...@gmail.com> wrote: > > The main downside with using the mask (or any solution based on a filter > > node / filtering) is that it requires that the delete indices go into the > > plan itself. So you need to first read the delete files and then create > > the plan. And, if there are many deleted rows, this can be costly. > > Ah, I see. I was assuming you could load the indices within the fragment > scan, at the same time the page index was read. That's how I'm implementing > with Lance, and how I plan to implement with Delta Lake. But if you can't > do that, then filtering with an anti-join makes sense. You wouldn't want to > include those in a plan. > > On Fri, Jun 2, 2023 at 7:38 AM Weston Pace <weston.p...@gmail.com> wrote: > > > Also, for clarity, I do agree with Gang that these are both valuable > > features in their own right. A mask makes a lot of sense for page > indices. > > > > On Fri, Jun 2, 2023 at 7:36 AM Weston Pace <weston.p...@gmail.com> > wrote: > > > > > > then I think the incremental cost of adding the > > > > positional deletes to the mask is probably lower than the anti-join. > > > Do you mean developer cost? Then yes, I agree. Although there may be > > > some subtlety in the pushdown to connect a dataset filter to a parquet > > > reader filter. > > > > > > The main downside with using the mask (or any solution based on a > filter > > > node / filtering) is that it requires that the delete indices go into > the > > > plan itself. So you need to first read the delete files and then > create > > > the plan. And, if there are many deleted rows, this can be costly. > > > > > > On Thu, Jun 1, 2023 at 7:13 PM Will Jones <will.jones...@gmail.com> > > wrote: > > > > > >> That's a good point, Gang. To perform deletes, we definitely need the > > row > > >> index, so we'll want that regardless of whether it's used in scans. > > >> > > >> > I'm not sure a mask would be the ideal solution for Iceberg (though > it > > >> is > > >> a reasonable feature in its own right) because I think position-based > > >> deletes, in Iceberg, are still done using an anti-join and not a > filter. > > >> > > >> For just positional deletes in isolation, I agree the mask wouldn't be > > >> more > > >> optimal than the anti-join. But if they end up using the mask for > > >> filtering > > >> with the page index, then I think the incremental cost of adding the > > >> positional deletes to the mask is probably lower than the anti-join. > > >> > > >> On Thu, Jun 1, 2023 at 6:33 PM Gang Wu <ust...@gmail.com> wrote: > > >> > > >> > IMO, the adding a row_index column from the reader is orthogonal to > > >> > the mask implementation. Table formats (e.g. Apache Iceberg and > > >> > Delta) require the knowledge of row index to finalize row deletion. > It > > >> > would be trivial to natively support row index from the file reader. > > >> > > > >> > Best, > > >> > Gang > > >> > > > >> > On Fri, Jun 2, 2023 at 3:40 AM Weston Pace <weston.p...@gmail.com> > > >> wrote: > > >> > > > >> > > I agree that having a row_index is a good approach. I'm not sure > a > > >> mask > > >> > > would be the ideal solution for Iceberg (though it is a reasonable > > >> > feature > > >> > > in its own right) because I think position-based deletes, in > > Iceberg, > > >> are > > >> > > still done using an anti-join and not a filter. > > >> > > > > >> > > That being said, we probably also want to implement a streaming > > >> > merge-based > > >> > > anti-join because I believe delete files are ordered by row_index > > and > > >> so > > >> > a > > >> > > streaming approach is likely to be much more performant. > > >> > > > > >> > > On Mon, May 29, 2023 at 4:01 PM Will Jones < > will.jones...@gmail.com > > > > > >> > > wrote: > > >> > > > > >> > > > Hi Rusty, > > >> > > > > > >> > > > At first glance, I think adding a row_index column would make > > >> sense. To > > >> > > be > > >> > > > clear, this would be an index within a file / fragment, not > across > > >> > > multiple > > >> > > > files, which don't necessarily have a known ordering in Acero > > >> (IIUC). > > >> > > > > > >> > > > However, another approach would be to take a mask argument in > the > > >> > Parquet > > >> > > > reader. We may wish to do this anyways for support for using > > >> predicate > > >> > > > pushdown with Parquet's page index. While Arrow C++ hasn't yet > > >> > > implemented > > >> > > > predicate pushdown on page index (right now just supports row > > >> groups), > > >> > > > Arrow Rust has and provides an API to pass in a mask to support > > it. > > >> The > > >> > > > reason for this implementation is described in the blog post > > >> "Querying > > >> > > > Parquet with Millisecond Latency" [1], under "Page Pruning". The > > >> > > > RowSelection struct API is worth a look [2]. > > >> > > > > > >> > > > I'm not yet sure which would be preferable, but I think > adopting a > > >> > > similar > > >> > > > pattern to what the Rust community has done may be wise. It's > > >> possible > > >> > > that > > >> > > > row_index is easy to implement while the mask will take time, in > > >> which > > >> > > case > > >> > > > row_index makes sense as an interim solution. > > >> > > > > > >> > > > Best, > > >> > > > > > >> > > > Will Jones > > >> > > > > > >> > > > [1] > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/ > > >> > > > [2] > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://docs.rs/parquet/40.0.0/parquet/arrow/arrow_reader/struct.RowSelection.html > > >> > > > > > >> > > > On Mon, May 29, 2023 at 2:12 PM Rusty Conover > > >> <ru...@conover.me.invalid > > >> > > > > >> > > > wrote: > > >> > > > > > >> > > > > Hi Arrow Team, > > >> > > > > > > >> > > > > I wanted to suggest an improvement regarding Acero's Scan > node. > > >> > > > > Currently, it provides useful information such as > > >> __fragment_index, > > >> > > > > __batch_index, __filename, and __last_in_fragment. However, it > > >> would > > >> > > > > be beneficial to have an additional column that returns an > > overall > > >> > > > > "row index" from the source. > > >> > > > > > > >> > > > > The row index would start from zero and increment for each row > > >> > > > > retrieved from the source, particularly in the case of Parquet > > >> files. > > >> > > > > Is it currently possible to obtain this row index or would > > >> expanding > > >> > > > > the Scan node's behavior be required? > > >> > > > > > > >> > > > > Having this row index column would be valuable in implementing > > >> > support > > >> > > > > for Iceberg's positional-based delete files, as outlined in > the > > >> > > > > following link: > > >> > > > > > > >> > > > > https://iceberg.apache.org/spec/#delete-formats > > >> > > > > > > >> > > > > While Iceberg's value-based deletes can already be performed > > using > > >> > the > > >> > > > > support for anti joins, using a projection node does not > > guarantee > > >> > the > > >> > > > > row ordering within an Acero graph. Hence, the inclusion of a > > >> > > > > dedicated row index column would provide a more reliable > > solution > > >> in > > >> > > > > this context. > > >> > > > > > > >> > > > > Thank you for considering this suggestion. > > >> > > > > > > >> > > > > Rusty > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >