GitHub user dhruve opened a pull request:

    https://github.com/apache/spark/pull/15370

    [SPARK-17417][Core] Fix # of partitions for Reliable RDD checkpointing

    ## What changes were proposed in this pull request?
    Currently the no. of partition files are limited to 10000 files (%05d 
format). If there are more than 10000 part files, the logic goes for a toss 
while recreating the RDD as it sorts them by string. More details can be found 
in the JIRA desc [here](https://issues.apache.org/jira/browse/SPARK-17417).
    
    ## How was this patch tested?
    I tested this patch by checkpointing a RDD and then manually renaming part 
files to the old format and tried to access the RDD. It was successfully 
created from the old format. Also verified loading a sample parquet file and 
saving it as multiple formats - CSV, JSON, Text, Parquet, ORC and read them 
successfully back from the saved files. I couldn't launch the unit test from my 
local box, so will wait for the Jenkins output. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhruve/spark bug/SPARK-17417

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15370.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15370
    
----
commit cbbffda94bb5ba6df90051890a161b3a62a6b1a2
Author: Dhruve Ashar <dhruveas...@gmail.com>
Date:   2016-10-06T01:30:08Z

    [SPARK-17417] Fix # of partitions for Reliable RDD checkpointing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to