[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146624#comment-16146624 ] Shivaram Venkataraman commented on SPARK-21349: --- Thanks for checking. In that case I dont think we can do much about this specific case. For RDDs created from the driver it is inevitable that we need to ship the data to the executors. > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146599#comment-16146599 ] Dongjoon Hyun commented on SPARK-21349: --- Yes. For the fewer values like 24*365*1, the warning does not pop up. > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146585#comment-16146585 ] Shivaram Venkataraman commented on SPARK-21349: --- I think this might be that we create a ParallelCollectionRDD for the statement `(1 to (24*365*3))` -- The values are stored in the partition for this RDD [1] [~dongjoon] If you use fewer values (i.e. say 1 to 100) or more partitions (I'm not sure how many partitions are created in this example) does the warning go away ? [1] https://github.com/apache/spark/blob/e47f48c737052564e92903de16ff16707fae32c3/core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala#L32 > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146569#comment-16146569 ] Dongjoon Hyun commented on SPARK-21349: --- Hi, [~jiangxb] and all. I hit this issue again in another situation today. So, I want to share the sample case. {code} scala> val data = (1 to (24*365*3)).map(i => (i, s"$i", i % 2 == 0)).toDF("col1", "part1", "part2") 17/08/29 21:07:49 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException data: org.apache.spark.sql.DataFrame = [col1: int, part1: string ... 1 more field] scala> data.write.format("parquet").partitionBy("part1", "part2").mode("overwrite").saveAsTable("t") 17/08/29 21:08:04 WARN TaskSetManager: Stage 0 contains a task of very large size (190 KB). The maximum recommended task size is 100 KB. 17/08/29 21:09:34 WARN TaskSetManager: Stage 2 contains a task of very large size (233 KB). The maximum recommended task size is 100 KB. scala> spark.version res1: String = 2.3.0-SNAPSHOT {code} > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081518#comment-16081518 ] Jiang Xingbo commented on SPARK-21349: -- [~dongjoon] Are you running the test for Spark SQL? Or running some user-defined RDD directly? This information should help us narrowing down the scope of the problem. Thanks! > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081431#comment-16081431 ] Wenchen Fan commented on SPARK-21349: - [~rxin] that only helps with internal accumulators, but seems the problems here is we have too many sql metrics. Maybe we should prioritize sql metrics accumulators and have some special optimization for them to reduce the size. > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081128#comment-16081128 ] Reynold Xin commented on SPARK-21349: - cc [~cloud_fan] Shouldn't task metric just be a single accumulator, rather than a list of them? That would substantially cut down the serialization size. > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080890#comment-16080890 ] Dongjoon Hyun commented on SPARK-21349: --- Thank you, [~shivaram] and @Kay Ousterhout. Okay. It looks a consensus. Then, in order to make it final, let me ping SQL committers here. Hi, [~rxin], [~cloud_fan], [~smilegator]. Could you give us your opinion here, too? > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080875#comment-16080875 ] Shivaram Venkataraman commented on SPARK-21349: --- Well 100K is already too large IMHO and I'm not sure adding another config property is really helping things just to silence some log messages. Looking at the code it seems that the larger task sizes mostly stem from the TaskMetrics objects getting bigger -- especially with a number of new SQL metrics being added. I think the right fix here is to improve the serialization of TaskMetrics (especially if the structure is empty, why bother sending anything at all to the worker ?) > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080817#comment-16080817 ] Dongjoon Hyun commented on SPARK-21349: --- I usually saw 200K~300K. And, the following is our Apache Spark unit test logs. {code} $ curl -LO "https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3170/consoleFull" $ grep 'contains a task of very large size' consoleFull | awk -F"(" '{print $2}' | awk '{print $1}' | sort -n | uniq -c 6 104 4 234 4 235 4 251 4 255 4 264 4 272 4 275 4 278 4 568 4 658 4 677 4 684 4 687 4 692 4 736 4 761 4 764 4 778 4 795 4 817 4 874 4 1009 1 1370 1 2065 1 2760 1 2763 1 3007 1 3012 1 3015 1 3016 1 3021 1 3022 2 3917 12 4051 3 4792 1 15050 1 15056 {code} > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080789#comment-16080789 ] Kay Ousterhout commented on SPARK-21349: Out of curiosity, what are the task sizes that you're seeing? +[~shivaram] -- I know you've looked at task size a lot. Are these getting bigger / do you think we should just raise the warning size for everyone? > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079751#comment-16079751 ] Dongjoon Hyun commented on SPARK-21349: --- This issue is not about blindly raising the threashold, 100K. The default value will be the same for all users. What I mean is the size of task become bigger now than 3 years ago. Currenlty, it complains more frequently and misleadingly. Is there any reason or criteria to decide 100K as the threadhold at that time three years ago? Then, at least, can we evaluate the threshold? > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079746#comment-16079746 ] Kay Ousterhout commented on SPARK-21349: Does that mean we should just raise this threshold for all users? > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. > According to the Jenkins log, we also have 123 warnings even in our unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079728#comment-16079728 ] Dongjoon Hyun commented on SPARK-21349: --- Thank you for advice, [~kayousterhout]! For usability, we are buliding more complex tasks like Spark SQL before 3 years ago. This always happens for the specific user apps now. For that apps, it would be great if we configure this. In addition, we can make this as `internal` configuration instead of CONSTANT. > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079724#comment-16079724 ] Kay Ousterhout commented on SPARK-21349: Is this a major usability issue (and what's the use case where task sizes are regularly > 100KB)? I'm hesitant to make this a configuration parameter -- Spark already has a huge number of configuration parameters, making it hard for users to figure out which ones are relevant for them. > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Since Spark 1.1.0, Spark emits warning when task size exceeds a threshold, > SPARK-2185. Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21349) Make TASK_SIZE_TO_WARN_KB configurable
[ https://issues.apache.org/jira/browse/SPARK-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079349#comment-16079349 ] Apache Spark commented on SPARK-21349: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/18573 > Make TASK_SIZE_TO_WARN_KB configurable > -- > > Key: SPARK-21349 > URL: https://issues.apache.org/jira/browse/SPARK-21349 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.3, 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > Although this is just a warning message, this issue tries to make > `TASK_SIZE_TO_WARN_KB` into a normal Spark configuration for advanced users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org