Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/19266
  
    Yeah, agree, it could be some global constant. I don't think it should be 
configurable. Ideally it's determined from the JVM, but don't know a way to do 
that. 
    
    In many cases, assuming Int.MaxValue is the max array size when it's 
Int.MaxValue-8 doesn't matter much. For example, arguably I should leave the ML 
changes alone here, because, in the very rare case that a matrix size is 
somewhere between Int.MaxValue-8 and Int.MaxValue, it will fail anyway, and 
it's not avoidable given the user input. It's also, maybe, more conservative to 
not always assume anything beyond Int.MaxValue-8 is going to fail, and not 
"proactively" fail at this cutoff.
    
    However I think there are a smallish number of identifiable cases where 
Spark can very much avoid the failure (like BufferHolder), and they're the 
instances where an array size keeps doubling. Maybe we can stick to those clear 
cases? especially any one that seems to have triggered the original error?
    
    Those cases are few enough and related enough that I'm sure they're just 
one issue, not several.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to