GitHub user dhruve opened a pull request: https://github.com/apache/spark/pull/15370
[SPARK-17417][Core] Fix # of partitions for Reliable RDD checkpointing ## What changes were proposed in this pull request? Currently the no. of partition files are limited to 10000 files (%05d format). If there are more than 10000 part files, the logic goes for a toss while recreating the RDD as it sorts them by string. More details can be found in the JIRA desc [here](https://issues.apache.org/jira/browse/SPARK-17417). ## How was this patch tested? I tested this patch by checkpointing a RDD and then manually renaming part files to the old format and tried to access the RDD. It was successfully created from the old format. Also verified loading a sample parquet file and saving it as multiple formats - CSV, JSON, Text, Parquet, ORC and read them successfully back from the saved files. I couldn't launch the unit test from my local box, so will wait for the Jenkins output. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhruve/spark bug/SPARK-17417 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15370.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15370 ---- commit cbbffda94bb5ba6df90051890a161b3a62a6b1a2 Author: Dhruve Ashar <dhruveas...@gmail.com> Date: 2016-10-06T01:30:08Z [SPARK-17417] Fix # of partitions for Reliable RDD checkpointing ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org