[ https://issues.apache.org/jira/browse/SPARK-32758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Tsukanov updated SPARK-32758: ---------------------------------- Environment: (was: должен ) > Spark ignores limit(1) and starts tasks for all partition > --------------------------------------------------------- > > Key: SPARK-32758 > URL: https://issues.apache.org/jira/browse/SPARK-32758 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Ivan Tsukanov > Priority: Major > Attachments: image-2020-09-01-10-51-09-417.png > > > If we run the following code > {code:scala} > val sparkConf = new SparkConf() > .setAppName("test-app") > .setMaster("local[1]") > val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate() > import sparkSession.implicits._ > val df = (1 to 100000) > .toDF("c1") > .repartition(1000) > implicit val encoder: ExpressionEncoder[Row] = RowEncoder(df.schema) > df.limit(1) > .map(identity) > .collect() > df.map(identity) > .limit(1) > .collect() > Thread.sleep(100000) > {code} > we will see that spark started 1002 tasks despite the fact there is limit(1) - > !image-2020-09-01-10-51-09-417.png! > Expected behavior - both scenarios (limit before and after map) will produce > the same results - one or two tasks to get one value from the DataFrame. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org