400Ping commented on PR #777: URL: https://github.com/apache/mahout/pull/777#issuecomment-3707921945
I have thought of 2 more memory-friendly alternatives to the current `read_npy -> Array2 -> flatten Vec` flow: - streaming/iterator reading: we parse the `.npy` header (dtype/shape/order) and then iterate the flat data from the file in small chunks (so the file doesn’t need to fit in RAM). - memory-mapping (mmap): we map the `.npy` file into memory (OS loads pages on demand), parse the header to locate the data region, and avoid an extra “read + flatten copy” peak while also enabling easy slicing/random access. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
