[ https://issues.apache.org/jira/browse/SPARK-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Graves resolved SPARK-24519. ----------------------------------- Resolution: Fixed Assignee: Hieu Tri Huynh Fix Version/s: 2.4.0 > MapStatus has 2000 hardcoded > ---------------------------- > > Key: SPARK-24519 > URL: https://issues.apache.org/jira/browse/SPARK-24519 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: Hieu Tri Huynh > Assignee: Hieu Tri Huynh > Priority: Minor > Fix For: 2.4.0 > > > MapStatus uses hardcoded value of 2000 partitions to determine if it should > use highly compressed map status. We should make it configurable to allow > users to more easily tune their jobs with respect to this without having for > them to modify their code to change the number of partitions. Note we can > leave this as an internal/undocumented config for now until we have more > advise for the users on how to set this config. > Some of my reasoning: > The config gives you a way to easily change something without the user having > to change code, redeploy jar, and then run again. You can simply change the > config and rerun. It also allows for easier experimentation. Changing the # > of partitions has other side affects, whether good or bad is situation > dependent. It can be worse are you could be increasing # of output files when > you don't want to be, affects the # of tasks needs and thus executors to run > in parallel, etc. > There have been various talks about this number at spark summits where people > have told customers to increase it to be 2001 partitions. Note if you just do > a search for spark 2000 partitions you will fine various things all talking > about this number. This shows that people are modifying their code to take > this into account so it seems to me having this configurable would be better. > Once we have more advice for users we could expose this and document > information on it. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org