Josh Rosen created SPARK-7660:
---------------------------------

             Summary: Snappy-java buffer-sharing bug leads to data corruption / 
test failures
                 Key: SPARK-7660
                 URL: https://issues.apache.org/jira/browse/SPARK-7660
             Project: Spark
          Issue Type: Bug
          Components: Shuffle, Spark Core
    Affects Versions: 1.4.0
            Reporter: Josh Rosen
            Priority: Blocker


snappy-java contains a bug that can lead to situations where separate 
SnappyOutputStream instances end up sharing the same input and output buffers, 
which can lead to data corruption issues.  See 
https://github.com/xerial/snappy-java/issues/107 for my upstream bug report and 
https://github.com/xerial/snappy-java/pull/108 for my patch to fix this issue.

I discovered this issue because the buffer-sharing was leading to a test 
failure in JavaAPISuite: one of the repartition-and-sort tests was returning 
the wrong answer because both tasks wrote their output using the same 
compression buffers and one task won the race, causing its output to be written 
to both shuffle output files. As a result, the test returned the result of 
collecting one partition twice.

The buffer-sharing can only occur if {{close()}} is called twice on the same 
SnappyOutputStream _and_ the JVM experiences little GC / memory pressure (for a 
more precise description of when this issue may occur, see my upstream 
tickets).  I think that this double-close happens somewhere in some test code 
that was added as part of my Tungsten shuffle patch, exposing this bug (to see 
this, download a recent build of master and run 
https://gist.github.com/JoshRosen/eb3257a75c16597d769f locally in order to 
force the test execution order that triggers the bug).

I think that it's rare that this bug would lead to silent failures like this. 
In more realistic workloads that aren't writing only a handful of bytes per 
task, I would expect this issue to lead to stream corruption issues like 
SPARK-4105.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to