mapleFU commented on issue #38275:
URL: https://github.com/apache/arrow/issues/38275#issuecomment-1766764506
With Mmap, arrow will create a memoryMapped file[1]. And `mmap(2)` [2] would
be called. It will make the filesystem build a "memory mapping", and give you a
page cache address.
FileSystem has some memory size, when memory is not enough, it will "swap
out" the mmap page to the block storage. And when next visit this part of data,
the data might be re-load from block storage.
I'm not sure it's this problem, but I guess you can try to just
```
in thread-pool:
load-batch from file
handling the batch
```
Or you can profile how time spend with flamegraph. It will make things more
clear.
[1]
https://arrow.apache.org/docs/cpp/api/io.html#_CPPv4N5arrow2io16MemoryMappedFileE
[2] https://man7.org/linux/man-pages/man2/mmap.2.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]