[jira] [Assigned] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)
[ https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17417: Assignee: Apache Spark > Fix # of partitions for RDD while checkpointing - Currently limited by > 1(%05d) > -- > > Key: SPARK-17417 > URL: https://issues.apache.org/jira/browse/SPARK-17417 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Dhruve Ashar >Assignee: Apache Spark > > Spark currently assumes # of partitions to be less than 10 and uses %05d > padding. > If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up > and fails. This is because of part-files are sorted and compared as strings. > This leads filename order to be part-1, part-10, ... instead of > part-1, part-10001, ..., part-10 and while reconstructing the > checkpointed RDD the job fails. > Possible solutions: > - Bump the padding to allow more partitions or > - Sort the part files extracting a sub-portion as string and then verify the > RDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)
[ https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17417: Assignee: (was: Apache Spark) > Fix # of partitions for RDD while checkpointing - Currently limited by > 1(%05d) > -- > > Key: SPARK-17417 > URL: https://issues.apache.org/jira/browse/SPARK-17417 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Dhruve Ashar > > Spark currently assumes # of partitions to be less than 10 and uses %05d > padding. > If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up > and fails. This is because of part-files are sorted and compared as strings. > This leads filename order to be part-1, part-10, ... instead of > part-1, part-10001, ..., part-10 and while reconstructing the > checkpointed RDD the job fails. > Possible solutions: > - Bump the padding to allow more partitions or > - Sort the part files extracting a sub-portion as string and then verify the > RDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org