[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-10-31 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781419#comment-17781419
 ] 

Tengfei Huang commented on SPARK-45687:
---

I will work on this.

> Fix `Passing an explicit array value to a Scala varargs method is deprecated`
> -
>
> Key: SPARK-45687
> URL: https://issues.apache.org/jira/browse/SPARK-45687
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0
> [warn]         df.agg(udaf(allColumns: _*)),
> [warn]                     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]                                                ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, 
> aggFunctions.tail: _*),
> [warn]                                                                        
>     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]           df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-10-31 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781374#comment-17781374
 ] 

Tengfei Huang commented on SPARK-45694:
---

sure, will include [SPARK-45695] Fix `method force in trait View is deprecated` 
- ASF JIRA (apache.org) in one PR.

> Fix `method signum in trait ScalaNumberProxy is deprecated`
> ---
>
> Key: SPARK-45694
> URL: https://issues.apache.org/jira/browse/SPARK-45694
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
>  method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
> `sign` method instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
>  origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
> [warn]       val uc = useCount.signum
> [warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-10-30 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781022#comment-17781022
 ] 

Tengfei Huang commented on SPARK-45694:
---

Hi [~LuciferYang], I will work on this.

> Fix `method signum in trait ScalaNumberProxy is deprecated`
> ---
>
> Key: SPARK-45694
> URL: https://issues.apache.org/jira/browse/SPARK-45694
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
>  method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
> `sign` method instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
>  origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
> [warn]       val uc = useCount.signum
> [warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43926) Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python

2023-06-11 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731326#comment-17731326
 ] 

Tengfei Huang commented on SPARK-43926:
---

I am working on this, will send a PR soon.

> Add array_agg, array_size, cardinality, 
> count_min_sketch,mask,named_struct,json_* to Scala and Python
> -
>
> Key: SPARK-43926
> URL: https://issues.apache.org/jira/browse/SPARK-43926
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add array_agg, array_size, cardinality, count_min_sketch
> Add following functions:
> * array_agg
> * array_size
> * cardinality
> * count_min_sketch
> * named_struct
> * json_array_length
> * json_object_keys
> * mask
>   to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42626) Add Destructive Iterator for SparkResult

2023-03-08 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698220#comment-17698220
 ] 

Tengfei Huang commented on SPARK-42626:
---

I will take a look! Thanks

> Add Destructive Iterator for SparkResult
> 
>
> Key: SPARK-42626
> URL: https://issues.apache.org/jira/browse/SPARK-42626
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add a destructive iterator to SparkResult. Instead of keeping everything in 
> memory for the life time of SparkResult object, clean it up as soon as we 
> know we are done with it. We can use this for Dataset.toLocalIterator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-02-26 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693753#comment-17693753
 ] 

Tengfei Huang commented on SPARK-42577:
---

I am working on this. Thanks. [~Ngone51]

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate

2023-02-25 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579
 ] 

Tengfei Huang edited comment on SPARK-42582 at 2/26/23 3:27 AM:


This is also discussed in PR: https://github.com/apache/spark/pull/39459

cc [~mridulm80] [~Ngone51]

Created this ticket to track the issue about inconsistent persisted rdd blocks 
issue.


was (Author: ivoson):
This is also discussed in PR: https://github.com/apache/spark/pull/39459

cc [~mridulm80] cc [~Ngone51]

Created this ticket to track the issue about inconsistent persisted rdd blocks 
issue.

> Persisted RDD blocks can be inconsistent if the RDD computation is 
> indeterminate
> 
>
> Key: SPARK-42582
> URL: https://issues.apache.org/jira/browse/SPARK-42582
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Tengfei Huang
>Priority: Major
>
> When a rdd includes indeterminate operations, the rdd results can be 
> different each time we recompute it.
> And when we cache such a rdd, we may have multiple rdd block replicas having 
> different data. Here is an example:
> 1. Task A generated the rdd block rdd_1_1 on executor E1;
> 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, 
> then it will compute and cache another block on E2; 
> If the results on E1 and E2 are differnet, we'll have 2 blocks for the same 
> rdd partition with different data.
> The behavior will be unexpcted for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate

2023-02-25 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579
 ] 

Tengfei Huang edited comment on SPARK-42582 at 2/26/23 3:27 AM:


This is also discussed in PR: https://github.com/apache/spark/pull/39459

cc [~mridulm80] cc [~Ngone51]

Created this ticket to track the issue about inconsistent persisted rdd blocks 
issue.


was (Author: ivoson):
This is also discussed in PR: https://github.com/apache/spark/pull/39459

> Persisted RDD blocks can be inconsistent if the RDD computation is 
> indeterminate
> 
>
> Key: SPARK-42582
> URL: https://issues.apache.org/jira/browse/SPARK-42582
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Tengfei Huang
>Priority: Major
>
> When a rdd includes indeterminate operations, the rdd results can be 
> different each time we recompute it.
> And when we cache such a rdd, we may have multiple rdd block replicas having 
> different data. Here is an example:
> 1. Task A generated the rdd block rdd_1_1 on executor E1;
> 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, 
> then it will compute and cache another block on E2; 
> If the results on E1 and E2 are differnet, we'll have 2 blocks for the same 
> rdd partition with different data.
> The behavior will be unexpcted for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate

2023-02-25 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579
 ] 

Tengfei Huang commented on SPARK-42582:
---

This is also discussed in PR: https://github.com/apache/spark/pull/39459

> Persisted RDD blocks can be inconsistent if the RDD computation is 
> indeterminate
> 
>
> Key: SPARK-42582
> URL: https://issues.apache.org/jira/browse/SPARK-42582
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Tengfei Huang
>Priority: Major
>
> When a rdd includes indeterminate operations, the rdd results can be 
> different each time we recompute it.
> And when we cache such a rdd, we may have multiple rdd block replicas having 
> different data. Here is an example:
> 1. Task A generated the rdd block rdd_1_1 on executor E1;
> 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, 
> then it will compute and cache another block on E2; 
> If the results on E1 and E2 are differnet, we'll have 2 blocks for the same 
> rdd partition with different data.
> The behavior will be unexpcted for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42582) Persisted RDD blocks can be inconsistent if the RDD computation is indeterminate

2023-02-25 Thread Tengfei Huang (Jira)
Tengfei Huang created SPARK-42582:
-

 Summary: Persisted RDD blocks can be inconsistent if the RDD 
computation is indeterminate
 Key: SPARK-42582
 URL: https://issues.apache.org/jira/browse/SPARK-42582
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.2
Reporter: Tengfei Huang


When a rdd includes indeterminate operations, the rdd results can be different 
each time we recompute it.
And when we cache such a rdd, we may have multiple rdd block replicas having 
different data. Here is an example:
1. Task A generated the rdd block rdd_1_1 on executor E1;
2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, then 
it will compute and cache another block on E2; 
If the results on E1 and E2 are differnet, we'll have 2 blocks for the same rdd 
partition with different data.

The behavior will be unexpcted for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org