[ https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-2532: ----------------------------- Target Version/s: (was: 1.2.0) > Fix issues with consolidated shuffle > ------------------------------------ > > Key: SPARK-2532 > URL: https://issues.apache.org/jira/browse/SPARK-2532 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core > Affects Versions: 1.1.0 > Environment: All > Reporter: Mridul Muralidharan > Priority: Critical > > Will file PR with changes as soon as merge is done (earlier merge became > outdated in 2 weeks unfortunately :) ). > Consolidated shuffle is broken in multiple ways in spark : > a) Task failure(s) can cause the state to become inconsistent. > b) Multiple revert's or combination of close/revert/close can cause the state > to be inconsistent. > (As part of exception/error handling). > c) Some of the api in block writer causes implementation issues - for > example: a revert is always followed by close : but the implemention tries to > keep them separate, resulting in surface for errors. > d) Fetching data from consolidated shuffle files can go badly wrong if the > file is being actively written to : it computes length by subtracting next > offset from current offset (or length if this is last offset)- the latter > fails when fetch is happening in parallel to write. > Note, this happens even if there are no task failures of any kind ! > This usually results in stream corruption or decompression errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org