I agree with you, the amount of memory used depends on the user behavior
but that is the point : to have only what the user is using and not.
And I also agree, even with memory mapping, the disk load can still be
present.
For example, I noticed that with the code below the read_all call with
Thanks for the clarification, I understand your use case better now. You
are right that memory mapping can be used in this way you describe.
> why does it decompresses the data here ? For me it is doing a unnecessary
copy
> by transforming a compressed record batch into a uncompressed record
Hi,
Thank you very much for your answer, I am sorry if some sentences are
confusing.
I did not know about the kernel space/user space and that memory mapping
I/O was more general than just file memory mapping. I got a better
understanding now.
So I looked a bit deeper inside memory mapping
Well, I suppose there are cases where you can map a file with memory mapped
I/O and then, if you are careful not to touch those buffers, they might not
be loaded into memory. However, that is a very difficult thing to
achieve. For example, when reading a file we need to access the metadata
that
I'm a little bit confused on the benchmark. The benchmark is labeled "open
file" and yet "read_table" will read the entire file into memory. I don't
think your other benchmarks are doing this (e.g. they are not reading data
into memory).
As for the questions on memory mapping, I have a few
Hello everyone,
For several years I have been working with HDF5 files to store/load
information and pandas as in-memory representation to analyze them.
Globally the data can be of variable sizes (from a few MB to 10GB). I
use the dataframes inside interactive tools (with a GUI, where the data