[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781419#comment-17781419 ] Tengfei Huang commented on SPARK-45687: --- I will work on this. > Fix `Passing an explicit array value to a Scala varargs method is deprecated` > - > > Key: SPARK-45687 > URL: https://issues.apache.org/jira/browse/SPARK-45687 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0 > [warn] df.agg(udaf(allColumns: _*)), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, > aggFunctions.tail: _*), > [warn] > ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781374#comment-17781374 ] Tengfei Huang commented on SPARK-45694: --- sure, will include [SPARK-45695] Fix `method force in trait View is deprecated` - ASF JIRA (apache.org) in one PR. > Fix `method signum in trait ScalaNumberProxy is deprecated` > --- > > Key: SPARK-45694 > URL: https://issues.apache.org/jira/browse/SPARK-45694 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: > method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use > `sign` method instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, > origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 > [warn] val uc = useCount.signum > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781022#comment-17781022 ] Tengfei Huang commented on SPARK-45694: --- Hi [~LuciferYang], I will work on this. > Fix `method signum in trait ScalaNumberProxy is deprecated` > --- > > Key: SPARK-45694 > URL: https://issues.apache.org/jira/browse/SPARK-45694 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: > method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use > `sign` method instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, > origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 > [warn] val uc = useCount.signum > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43926) Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731326#comment-17731326 ] Tengfei Huang commented on SPARK-43926: --- I am working on this, will send a PR soon. > Add array_agg, array_size, cardinality, > count_min_sketch,mask,named_struct,json_* to Scala and Python > - > > Key: SPARK-43926 > URL: https://issues.apache.org/jira/browse/SPARK-43926 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add array_agg, array_size, cardinality, count_min_sketch > Add following functions: > * array_agg > * array_size > * cardinality > * count_min_sketch > * named_struct > * json_array_length > * json_object_keys > * mask > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42626) Add Destructive Iterator for SparkResult
[ https://issues.apache.org/jira/browse/SPARK-42626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698220#comment-17698220 ] Tengfei Huang commented on SPARK-42626: --- I will take a look! Thanks > Add Destructive Iterator for SparkResult > > > Key: SPARK-42626 > URL: https://issues.apache.org/jira/browse/SPARK-42626 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Add a destructive iterator to SparkResult. Instead of keeping everything in > memory for the life time of SparkResult object, clean it up as soon as we > know we are done with it. We can use this for Dataset.toLocalIterator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693753#comment-17693753 ] Tengfei Huang commented on SPARK-42577: --- I am working on this. Thanks. [~Ngone51] > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Priority: Major > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate
[ https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579 ] Tengfei Huang edited comment on SPARK-42582 at 2/26/23 3:27 AM: This is also discussed in PR: https://github.com/apache/spark/pull/39459 cc [~mridulm80] [~Ngone51] Created this ticket to track the issue about inconsistent persisted rdd blocks issue. was (Author: ivoson): This is also discussed in PR: https://github.com/apache/spark/pull/39459 cc [~mridulm80] cc [~Ngone51] Created this ticket to track the issue about inconsistent persisted rdd blocks issue. > Persisted RDD blocks can be inconsistent if the RDD computation is > indeterminate > > > Key: SPARK-42582 > URL: https://issues.apache.org/jira/browse/SPARK-42582 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Tengfei Huang >Priority: Major > > When a rdd includes indeterminate operations, the rdd results can be > different each time we recompute it. > And when we cache such a rdd, we may have multiple rdd block replicas having > different data. Here is an example: > 1. Task A generated the rdd block rdd_1_1 on executor E1; > 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, > then it will compute and cache another block on E2; > If the results on E1 and E2 are differnet, we'll have 2 blocks for the same > rdd partition with different data. > The behavior will be unexpcted for such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate
[ https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579 ] Tengfei Huang edited comment on SPARK-42582 at 2/26/23 3:27 AM: This is also discussed in PR: https://github.com/apache/spark/pull/39459 cc [~mridulm80] cc [~Ngone51] Created this ticket to track the issue about inconsistent persisted rdd blocks issue. was (Author: ivoson): This is also discussed in PR: https://github.com/apache/spark/pull/39459 > Persisted RDD blocks can be inconsistent if the RDD computation is > indeterminate > > > Key: SPARK-42582 > URL: https://issues.apache.org/jira/browse/SPARK-42582 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Tengfei Huang >Priority: Major > > When a rdd includes indeterminate operations, the rdd results can be > different each time we recompute it. > And when we cache such a rdd, we may have multiple rdd block replicas having > different data. Here is an example: > 1. Task A generated the rdd block rdd_1_1 on executor E1; > 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, > then it will compute and cache another block on E2; > If the results on E1 and E2 are differnet, we'll have 2 blocks for the same > rdd partition with different data. > The behavior will be unexpcted for such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate
[ https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579 ] Tengfei Huang commented on SPARK-42582: --- This is also discussed in PR: https://github.com/apache/spark/pull/39459 > Persisted RDD blocks can be inconsistent if the RDD computation is > indeterminate > > > Key: SPARK-42582 > URL: https://issues.apache.org/jira/browse/SPARK-42582 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: Tengfei Huang >Priority: Major > > When a rdd includes indeterminate operations, the rdd results can be > different each time we recompute it. > And when we cache such a rdd, we may have multiple rdd block replicas having > different data. Here is an example: > 1. Task A generated the rdd block rdd_1_1 on executor E1; > 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, > then it will compute and cache another block on E2; > If the results on E1 and E2 are differnet, we'll have 2 blocks for the same > rdd partition with different data. > The behavior will be unexpcted for such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate
Tengfei Huang created SPARK-42582: - Summary: Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate Key: SPARK-42582 URL: https://issues.apache.org/jira/browse/SPARK-42582 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.2 Reporter: Tengfei Huang When a rdd includes indeterminate operations, the rdd results can be different each time we recompute it. And when we cache such a rdd, we may have multiple rdd block replicas having different data. Here is an example: 1. Task A generated the rdd block rdd_1_1 on executor E1; 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, then it will compute and cache another block on E2; If the results on E1 and E2 are differnet, we'll have 2 blocks for the same rdd partition with different data. The behavior will be unexpcted for such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org