Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/10024#issuecomment-211424963
  
    Hi @lianhuiwang thanks for submitting this patch.
    
    I just have a really high-level question first.  If I understand how this 
works correctly, the idea is that:
    1) as data is added to `Spillable`s, they work exactly the same as before 
at first, so after all records have been inserted, they've still got a bunch of 
data in memory.
    2) When an iterator is requested, as before its still an iterator that 
merges data has been spilled to disk, and data that is still-in-memory
    3) If the `TaskMemoryManager` requests more memory while that iterator is 
in flight, then the `Spillable`s look at the position of the current iterator 
over the in-memory data, and spill only the _remaining_ data to disk
    4) The `Spillable` then free the memory to the `TaskMemoryManager`, and 
have the in-flight iterator switch to using the new spill file.
    
    Is this a correct understanding?  If so, this seems to hinge on one key 
assumption: that the `Spillable`s are never iterated over multiple times.  If 
they were iterated multiple times, then the second iteration would be incorrect 
-- some of the initial data in the in-memory structure would be lost on the 
second iteration.
    
    I think this assumption is sound -- it is implied by 
"destructive"SortedIterator in the internals, though I think the actual 
wrapping `Spillable`s might allow multiple iteration now.  I've been trying to 
think of a case where this assumption would be violated but cant' come up with 
anything.  (If the same shuffle data is read multiple times in different tasks, 
the work on the shuffle-read side is simply repeated each time, I'm pretty sure 
there isn't any sharing).  But nonetheless if that is the case, I think this 
deserves both a lot of comments explaining how this works, assertions which 
make sure this assumption is not violated, and a number of tests.
    
    FWIW, I started down the path of writing something similar with*out* that 
assumption -- when a spill was requested on an in-flight iterator, then the 
*entire* in-memory structure would get spilled to disk, and the in-flight 
iterator would switch to the spilled data, and advance to the same location in 
the spilled data that it was on the in-memory data.  This was pretty 
convoluted, and as I started writing tests I realized there were corner cases 
that needed work.  So I decided to submit the simpler change instead.  It seems 
much easier to it your way.    I do have some test which I think I can add as 
well -- lemme dig those up and send them later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to