Re: [PR] [QDP] Numpy IO support [mahout]

via GitHub Sun, 04 Jan 2026 01:49:50 -0800


400Ping commented on PR #777:
URL: https://github.com/apache/mahout/pull/777#issuecomment-3707921945


   I have thought of 2 more memory-friendly alternatives to the current 
`read_npy -> Array2 -> flatten Vec` flow: 
   
   -  streaming/iterator reading: we parse the `.npy` header 
(dtype/shape/order) and then iterate the flat data from the file in small 
chunks (so the file doesn’t need to fit in RAM).
   -  memory-mapping (mmap): we map the `.npy` file into memory (OS loads pages 
on demand), parse the header to locate the data region, and avoid an extra 
“read + flatten copy” peak while also enabling easy slicing/random access.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Numpy IO support [mahout]

Reply via email to