Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/164#issuecomment-38008356
  
    @mengxr There are 3 reasons from me: 
    
    * The file length could be larger than `Int.maxValue`, and usually the 
buffer size of `FSDataInputStream ` is smaller than `Int.maxValue`.
    
    * `innerBuffer ` I used is an array of byte, we cannot use a length larger 
than `Int.maxValue` to initialize the array.
    
    * `byte[] innerBuffer = new byte[maxBufferLength];` could cause OOM if the 
`maxBufferLength` is too large to fit in your current usable memory.
    
    Indeed, the scenario that a file length is larger than `Int.maxValue` is 
rare, but it could happen. Mahout also have encountered the problem, but they 
just use `(int) fileLength` to cast a `long` to an `int`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to