Jialin LIu created SPARK-26261:
----------------------------------

             Summary: Spark does not check completeness temporary file 
                 Key: SPARK-26261
                 URL: https://issues.apache.org/jira/browse/SPARK-26261
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.2
            Reporter: Jialin LIu


Spark does not check temporary files' completeness. When persisting to disk is 
enabled on some RDDs, a bunch of temporary files will be created on blockmgr 
folder. Block manager is able to detect missing blocks while it is not able 
detect file content being modified during execution. 

Our initial test shows that if we truncate the block file before being used by 
executors, the program will finish without detecting any error, but the result 
content is totally wrong.

We believe there should be a file checksum on every RDD file block and these 
files should be protected by checksum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to