Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/3600#issuecomment-66426107
  
    Hmm, might be tricky to explain if you do not have sufficient context, let 
me give it a shot.
    a) Streams in java are not usually multiplexed - unless explicitly stated 
otherwise.
    With this PR, the same underlying stream (fileStream) is being reused 
across deserializeStream (and its users).
    One way it manifests is (b)
    
    b) Most streams in java override finalize to close their underlying stream 
in case they are going out of scope (to prevent resource leak, etc) : ofcourse 
this is an implementation detail, but is the general expectation.
    In this case, deserializeStream gets re-assigned somewhere in the method 
below - causing the previous 'deserializeStream' to go out of scope.
    When gc kicks in, and then when finalizers are run, deserializeStream's 
finalize can call its close, resulting in fileStream to get closed - which 
might now be used by some other deserializeStream : since it was re-used.
    This will cause hard to debug crashes/bugs.
    
    I am sure I am missing other spectacular ways in which this can fail :-) - 
in general, these things happen when the basic api expectation (probably 
implicit here maybe) is broken.
    Now, we can go down this path in case the operation we are saving is very 
expensive -which is not the case here (it is a cheap file open/close which is 
saved).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to