[ https://issues.apache.org/jira/browse/SPARK-39602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562331#comment-17562331 ]
Tanin Na Nakorn edited comment on SPARK-39602 at 7/5/22 2:55 AM: ----------------------------------------------------------------- The number of partitions is for illustration. In our actual job, we use something like 40000, 60000. Currently, what we do is: {code} .groupBy( ... something ..., partitions = if (spark.conf.getOption("spark.stripe.testMode.enabled").contains("true")) { 10 } else { 60000 } ) {code} But you can imagine that it litters the code a bit. We need this kind of partitions in prod. Otherwise, it would fail. was (Author: JIRAUSER291610): The number of partitions is for illustration. In our actual job, we use something like 40000, 60000. Currently, what we do is: {code} .groupBy( ... something ..., partitions = if (spark.conf.getOption("spark.stripe.testMode.enabled").contains("true")) { 10 } else { 60000 } ) {code} But you can imagine that it litters the code a bit. > Invoking .repartition(100000) in a unit test causes the unit test to take >20 > minutes. > -------------------------------------------------------------------------------------- > > Key: SPARK-39602 > URL: https://issues.apache.org/jira/browse/SPARK-39602 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.3.0 > Reporter: Tanin Na Nakorn > Priority: Major > > Here's a proof of concept: > {code} > val result = spark > .createDataset(List("test")) > .rdd > .repartition(100000) > .map { _ => > "test" > } > .collect() > .toList > > println(result) > {code} > This code takes a very long time in unit test. > We aim to test for correctness in unit test... not testing the repartition. > Is there a way to make it faster? (e.g. disable partition in test) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org