Re: [I] [QDP][followup] Numpy file size check to prevent OOM [mahout]

via GitHub Mon, 12 Jan 2026 19:53:17 -0800


rich7420 commented on issue #787:
URL: https://github.com/apache/mahout/issues/787#issuecomment-3741755481


   Rather than a single hardcoded file-size cap, we can estimate the in-memory 
requirement from the .npy header (shape × dtype.itemsize; for float64 that’s 
rows * cols * 8) and compare against available host RAM with a safety factor 
(≈2×, since we may need an extra copy when flattening/contiguizing). If we 
still want a simple fallback, stat().len() is fine for .npy (on-disk size ~ 
in-memory size), but please make the limit configurable (env/arg) and/or 
dynamic (e.g., refuse when estimated_bytes * 2 > MemAvailable * 0.8). The error 
should print estimated bytes + how to override, and suggest Parquet streaming 
for huge datasets. WDYT??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [QDP][followup] Numpy file size check to prevent OOM [mahout]

Reply via email to