rich7420 commented on issue #787:
URL: https://github.com/apache/mahout/issues/787#issuecomment-3741755481

   Rather than a single hardcoded file-size cap, we can estimate the in-memory 
requirement from the .npy header (shape × dtype.itemsize; for float64 that’s 
rows * cols * 8) and compare against available host RAM with a safety factor 
(≈2×, since we may need an extra copy when flattening/contiguizing). If we 
still want a simple fallback, stat().len() is fine for .npy (on-disk size ~ 
in-memory size), but please make the limit configurable (env/arg) and/or 
dynamic (e.g., refuse when estimated_bytes * 2 > MemAvailable * 0.8). The error 
should print estimated bytes + how to override, and suggest Parquet streaming 
for huge datasets. WDYT??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to