but it's better to support(start_index, max_rows), so that I don't have to save
row_index column
--原始邮件--
发件人:
"user"
It works for me. Because I use multi-threaded reading. When the filter is not
set, it is ok to read batches sequentially. After setting the filter, the
previous batch may read less or no data. Then the next batch I judged whether
it was empty before I finished reading, which led to the
Also, when I added filter, my program had an unexpected coredump, and I'm now
looking at why. I did it based on tfio's code
--原始邮件--
发件人:
I tried the method proposed by Aldrin, but when my offset exceeds a batch
length, my ReadNext() will fetch a batch with row=0. That is, after I set the
filter, my call to ReadNext will not fetch the batch directly at the beginning
of the filter. I may need to call batch n times in a row before
We do not have the option to do this today. However, it is something
we could do a better job of as long as we aren't reading CSV.
Aldrin's workaround is pretty solid, especially if you are reading
parquet and have a row_index column. Parquet statistics filtering
should ensure we are only