Hi!

I have noticed 2 things while using
pyarrow.dataset.dataset  with ADLFS with parquet and I wonder if this is
something
worth opening a ticket for.

1. the first read is always 65536, then it is followed by read of the size
of parquet.
I wonder if there is a way to have the size of the first read defined and
have just 1 read.
I pretty much know how large is the footer in parquet files I am getting
and I would like to read it in one request.

2. looks like parquet footer is read on almost every subsequent call . Is
there a way to cache
parquet footer so it is not read every time?

Thanks in advance for your insights,

Jacek

Reply via email to