[jira] [Updated] (SPARK-45670) SparkSubmit does not support --total-executor-cores when deploying on K8s

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45670:
-
Fix Version/s: 3.4.2
   3.5.1

> SparkSubmit does not support --total-executor-cores when deploying on K8s
> -
>
> Key: SPARK-45670
> URL: https://issues.apache.org/jira/browse/SPARK-45670
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 3.5.1, 3.3.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45670) SparkSubmit does not support --total-executor-cores when deploying on K8s

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45670.
--
Fix Version/s: 3.3.4
   Resolution: Fixed

Issue resolved by pull request 43548
[https://github.com/apache/spark/pull/43548]

> SparkSubmit does not support --total-executor-cores when deploying on K8s
> -
>
> Key: SPARK-45670
> URL: https://issues.apache.org/jira/browse/SPARK-45670
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45670) SparkSubmit does not support --total-executor-cores when deploying on K8s

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45670:


Assignee: Cheng Pan

> SparkSubmit does not support --total-executor-cores when deploying on K8s
> -
>
> Key: SPARK-45670
> URL: https://issues.apache.org/jira/browse/SPARK-45670
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45637) Time window aggregation in separate streams followed by stream-stream join not returning results

2023-10-26 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-45637:

Description: 
According to documentation update (SPARK-42591) resulting from SPARK-42376, 
Spark 3.5.0 should support time-window aggregations in two separate streams 
followed by stream-stream window join:

[https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995]

However, I failed to reproduce this example and the query I built doesn't 
return any results:
{code:java}
from pyspark.sql.functions import rand
from pyspark.sql.functions import expr, window, window_time

spark.conf.set("spark.sql.shuffle.partitions", "1")

impressions = (
spark
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()
.selectExpr("value AS adId", "timestamp AS impressionTime")
)

impressionsWithWatermark = impressions \
.selectExpr("adId AS impressionAdId", "impressionTime") \
.withWatermark("impressionTime", "10 seconds")

clicks = (  
spark  
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()  
.where((rand() * 100).cast("integer") < 10)  # 10 out of every 100 
impressions result in a click  
.selectExpr("(value - 10) AS adId ", "timestamp AS clickTime")  # -10 so 
that a click with same id as impression is generated later (i.e. delayed data).
    .where("adId > 0")  
) 

clicksWithWatermark = clicks \
.selectExpr("adId AS clickAdId", "clickTime") \
.withWatermark("clickTime", "10 seconds")

clicksWindow = clicksWithWatermark.groupBy(  
window(clicksWithWatermark.clickTime, "1 minute")
).count()

impressionsWindow = impressionsWithWatermark.groupBy(
window(impressionsWithWatermark.impressionTime, "1 minute")
).count()

clicksAndImpressions = clicksWindow.join(impressionsWindow, "window", "inner")

clicksAndImpressions.writeStream \
.format("memory") \
.queryName("clicksAndImpressions") \
.outputMode("append") \
.start() {code}
 

My intuition is that I'm getting no results because to output results of the 
first stateful operator (time window aggregation), a watermark needs to pass 
the end timestamp of the window. And once the watermark is after the end 
timestamp of the window, this window is ignored at the second stateful operator 
(stream-stream) join because it's behind the watermark. Indeed, a small hack 
done to event time column (adding one minute) between two stateful operators 
makes it possible to get results:
{code:java}
clicksWindow2 = clicksWithWatermark.groupBy( 
window(clicksWithWatermark.clickTime, "1 minute")
).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 
MINUTE')).drop("window")

impressionsWindow2 = impressionsWithWatermark.groupBy(
window(impressionsWithWatermark.impressionTime, "1 minute")
).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 
MINUTE')).drop("window")

clicksAndImpressions2 = clicksWindow2.join(impressionsWindow2, "window_time", 
"inner")

clicksAndImpressions2.writeStream \
.format("memory") \
.queryName("clicksAndImpressions2") \
.outputMode("append") \
.start()  {code}
 

  was:
According to documentation update (SPARK-42591) resulting from SPARK-42376, 
Spark 3.5.0 should support time-window aggregations in two separate streams 
followed by stream-stream window join:

https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995

However, I failed to reproduce this example and the query I built doesn't 
return any results:
{code:java}
from pyspark.sql.functions import rand
from pyspark.sql.functions import expr, window, window_time

spark.conf.set("spark.sql.shuffle.partitions", "1")

impressions = (
spark
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()
.selectExpr("value AS adId", "timestamp AS impressionTime")
)

impressionsWithWatermark = impressions \
.selectExpr("adId AS impressionAdId", "impressionTime") \
.withWatermark("impressionTime", "10 seconds")

clicks = (  
spark  
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()  
.where((rand() * 100).cast("integer") < 10)  # 10 out of every 100 
impressions result in a click  
.selectExpr("(value - 10) AS adId ", "timestamp AS clickTime")  # -10 so 
that a click with same id as impression is generated later (i.e. delayed data). 
 .where("adId > 0")
) 

clicksWithWatermark = clicks \
.selectExpr("adId AS clickAdId", "clickTime") \
.withWatermark("clickTime", "10 seconds")

clicksWindow = clicksWithWatermark.groupBy(  
window(clicksWithWatermark.clickTime, "1 minute")
).count()

impressionsWindow =

[jira] [Updated] (SPARK-45698) Clean up the deprecated API usage related to `Buffer`

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45698:
---
Labels: pull-request-available  (was: )

> Clean up the deprecated API usage related to `Buffer`
> -
>
> Key: SPARK-45698
> URL: https://issues.apache.org/jira/browse/SPARK-45698
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> * method append in trait Buffer is deprecated (since 2.13.0)
>  * method prepend in trait Buffer is deprecated (since 2.13.0)
>  * method trimEnd in trait Buffer is deprecated (since 2.13.4)
>  * method trimStart in trait Buffer is deprecated (since 2.13.4)
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/deploy/IvyTestUtils.scala:319:18:
>  method append in trait Buffer is deprecated (since 2.13.0): Use appendAll 
> instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.deploy.IvyTestUtils.createLocalRepository, 
> origin=scala.collection.mutable.Buffer.append, version=2.13.0
> [warn]         allFiles.append(rFiles: _*)
> [warn]                  ^ 
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:183:13:
>  method trimEnd in trait Buffer is deprecated (since 2.13.4): use 
> dropRightInPlace instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.util.SizeEstimator.SearchState.dequeue, 
> origin=scala.collection.mutable.Buffer.trimEnd, version=2.13.4
> [warn]       stack.trimEnd(1)
> [warn]             ^{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45685) Use `LazyList` instead of `Stream`

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45685:
-
Description: 
* class Stream in package immutable is deprecated (since 2.13.0)
 * object Stream in package immutable is deprecated (since 2.13.0)
 * type Stream in package scala is deprecated (since 2.13.0)
 * value Stream in package scala is deprecated (since 2.13.0)
 * method append in class Stream is deprecated (since 2.13.0)
 * method toStream in trait IterableOnceOps is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20:
 class Stream in package immutable is deprecated (since 2.13.0): Use LazyList 
(which is fully lazy) instead of Stream (which has a lazy tail only)
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, 
origin=scala.collection.immutable.Stream, version=2.13.0
[warn]     val stream: () => Stream[T])
[warn]                    ^ {code}

  was:
* class Stream in package immutable is deprecated (since 2.13.0)object Stream in
 * package immutable is deprecated (since 2.13.0)
 * type Stream in package scala is deprecated (since 2.13.0)
 * value Stream in package scala is deprecated (since 2.13.0)
 * method append in class Stream is deprecated (since 2.13.0)
 * method toStream in trait IterableOnceOps is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20:
 class Stream in package immutable is deprecated (since 2.13.0): Use LazyList 
(which is fully lazy) instead of Stream (which has a lazy tail only)
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, 
origin=scala.collection.immutable.Stream, version=2.13.0
[warn]     val stream: () => Stream[T])
[warn]                    ^ {code}


> Use `LazyList` instead of `Stream`
> --
>
> Key: SPARK-45685
> URL: https://issues.apache.org/jira/browse/SPARK-45685
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> * class Stream in package immutable is deprecated (since 2.13.0)
>  * object Stream in package immutable is deprecated (since 2.13.0)
>  * type Stream in package scala is deprecated (since 2.13.0)
>  * value Stream in package scala is deprecated (since 2.13.0)
>  * method append in class Stream is deprecated (since 2.13.0)
>  * method toStream in trait IterableOnceOps is deprecated (since 2.13.0)
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20:
>  class Stream in package immutable is deprecated (since 2.13.0): Use LazyList 
> (which is fully lazy) instead of Stream (which has a lazy tail only)
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, 
> origin=scala.collection.immutable.Stream, version=2.13.0
> [warn]     val stream: () => Stream[T])
> [warn]                    ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45699:
-
Summary: Fix "Widening conversion from `TypeA` to `TypeB` is deprecated 
because it loses precision"  (was: Fix "Widening conversion from `TypeA` to 
`TypeB` is deprecated because it loses precision. Write `.toTypeB` instead")

> Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it 
> loses precision"
> --
>
> Key: SPARK-45699
> URL: https://issues.apache.org/jira/browse/SPARK-45699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
> [error]       val threshold = max(speculationMultiplier * medianDuration, 
> minTimeToSpeculation)
> [error]                                                                   ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
> [error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
> customizedThreshold = true)
> [error]                                                            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
>  Widening conversion from Int to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getInt(i)
> [error]                                                ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
>  Widening conversion from Long to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getLong(i)
> [error]                                                 ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
> [error]   override def getDouble(i: Int): Double = getLong(i)
> [error]                                                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision. Write `.toTypeB` instead"

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45699:
-
Summary: Fix "Widening conversion from `TypeA` to `TypeB` is deprecated 
because it loses precision. Write `.toTypeB` instead"  (was: Fix "Widening 
conversion from `OType` to `NType` is deprecated because it loses precision. 
Write `.toXX` instead")

> Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it 
> loses precision. Write `.toTypeB` instead"
> 
>
> Key: SPARK-45699
> URL: https://issues.apache.org/jira/browse/SPARK-45699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
> [error]       val threshold = max(speculationMultiplier * medianDuration, 
> minTimeToSpeculation)
> [error]                                                                   ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
> [error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
> customizedThreshold = true)
> [error]                                                            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
>  Widening conversion from Int to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getInt(i)
> [error]                                                ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
>  Widening conversion from Long to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getLong(i)
> [error]                                                 ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
> [error]   override def getDouble(i: Int): Double = getLong(i)
> [error]                                                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45704) Fix `legacy-binding`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45704:


 Summary: Fix `legacy-binding`
 Key: SPARK-45704
 URL: https://issues.apache.org/jira/browse/SPARK-45704
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala:93:11:
 reference to stop is ambiguous;
[error] it is both defined in the enclosing class StandaloneAppClient and 
inherited in the enclosing class ClientEndpoint as method stop (defined in 
trait RpcEndpoint, inherited through parent trait ThreadSafeRpcEndpoint)
[error] In Scala 2, symbols inherited from a superclass shadow symbols defined 
in an outer scope.
[error] Such references are ambiguous in Scala 3. To continue using the 
inherited symbol, write `this.stop`.
[error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. 
[quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, 
site=org.apache.spark.deploy.client.StandaloneAppClient.ClientEndpoint.onStart
[error]           stop()
[error]           ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala:171:9:
 reference to stop is ambiguous;
[error] it is both defined in the enclosing class StandaloneAppClient and 
inherited in the enclosing class ClientEndpoint as method stop (defined in 
trait RpcEndpoint, inherited through parent trait ThreadSafeRpcEndpoint)
[error] In Scala 2, symbols inherited from a superclass shadow symbols defined 
in an outer scope.
[error] Such references are ambiguous in Scala 3. To continue using the 
inherited symbol, write `this.stop`.
[error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. 
[quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, 
site=org.apache.spark.deploy.client.StandaloneAppClient.ClientEndpoint.receive
[error]         stop()
[error]         ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala:206:9:
 reference to stop is ambiguous;
[error] it is both defined in the enclosing class StandaloneAppClient and 
inherited in the enclosing class ClientEndpoint as method stop (defined in 
trait RpcEndpoint, inherited through parent trait ThreadSafeRpcEndpoint)
[error] In Scala 2, symbols inherited from a superclass shadow symbols defined 
in an outer scope.
[error] Such references are ambiguous in Scala 3. To continue using the 
inherited symbol, write `this.stop`.
[error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. 
[quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, 
site=org.apache.spark.deploy.client.StandaloneAppClient.ClientEndpoint.receiveAndReply
[error]         stop()
[error]         ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21:
 the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be 
checked at runtime because it has type parameters eliminated by erasure
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter
[error]       case Some(rp: RangePartitioner[K, V]) =>
[error]                     ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala:322:9:
 reference to stop is ambiguous;
[error] it is both defined in the enclosing class CoarseGrainedSchedulerBackend 
and inherited in the enclosing class DriverEndpoint as method stop (defined in 
trait RpcEndpoint, inherited through parent trait IsolatedThreadSafeRpcEndpoint)
[error] In Scala 2, symbols inherited from a superclass shadow symbols defined 
in an outer scope.
[error] Such references are ambiguous in Scala 3. To continue using the 
inherited symbol, write `this.stop`.
[error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. 
[quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, 
site=org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverEndpoint.receiveAndReply
[error]         stop()
[error]         ^
[info] compiling 29 Scala sources and 267 Java sources to 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/target/scala-2.13/classes
 ...
[warn] -target is deprecated: Use -release instead to compile against the 
correct platform API.
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/sp

[jira] [Created] (SPARK-45703) Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is eliminated by erasure`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45703:


 Summary: Fix `abstract type TypeA in type pattern Some[TypeA] is 
unchecked since it is eliminated by erasure`
 Key: SPARK-45703
 URL: https://issues.apache.org/jira/browse/SPARK-45703
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala:105:19:
 abstract type ScalaInputType in type pattern Some[ScalaInputType] is unchecked 
since it is eliminated by erasure
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.catalyst.CatalystTypeConverters.CatalystTypeConverter.toCatalyst
[error]         case opt: Some[ScalaInputType] => toCatalystImpl(opt.get)
[error]                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45702) Fix `the type test for pattern TypeA cannot be checked at runtime`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45702:


 Summary: Fix `the type test for pattern TypeA cannot be checked at 
runtime`
 Key: SPARK-45702
 URL: https://issues.apache.org/jira/browse/SPARK-45702
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21:
 the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be 
checked at runtime because it has type parameters eliminated by erasure
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter
[error]       case Some(rp: RangePartitioner[K, V]) =>
[error]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45701) Clean up the deprecated API usage related to `SetOps`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45701:


 Summary: Clean up the deprecated API usage related to `SetOps`
 Key: SPARK-45701
 URL: https://issues.apache.org/jira/browse/SPARK-45701
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


* method - in trait SetOps is deprecated (since 2.13.0)
 * method – in trait SetOps is deprecated (since 2.13.0)
 * method + in trait SetOps is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala:70:32:
 method + in trait SetOps is deprecated (since 2.13.0): Consider requiring an 
immutable Set or fall back to Set.union
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.storage.BlockReplicationUtils.getSampleIds.indices.$anonfun,
 origin=scala.collection.SetOps.+, version=2.13.0
[warn]       if (set.contains(t)) set + i else set + t
[warn]                                ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45684) Clean up the deprecated API usage related to `SeqOps`

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45684:
-
Description: 
* method transform in trait SeqOps is deprecated (since 2.13.0)
 * method reverseMap in trait SeqOps is deprecated (since 2.13.0)
 * method retain in trait SetOps is deprecated (since 2.13.0)
 * method union in trait SeqOps is deprecated (since 2.13.0)

{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15:
 method transform in trait SeqOps is deprecated (since 2.13.0): Use 
`mapInPlace` on an `IndexedSeq` instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, 
origin=scala.collection.mutable.SeqOps.transform, version=2.13.0
[warn]       centers.transform(_ / numCoefficientSets)
[warn]               ^ {code}

  was:
* method transform in trait SeqOps is deprecated (since 2.13.0)
 * method reverseMap in trait SeqOps is deprecated (since 2.13.0)
 * method retain in trait SetOps is deprecated (since 2.13.0)
 * method - in trait SetOps is deprecated (since 2.13.0)
 * method -- in trait SetOps is deprecated (since 2.13.0)
 * method + in trait SetOps is deprecated (since 2.13.0)

{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15:
 method transform in trait SeqOps is deprecated (since 2.13.0): Use 
`mapInPlace` on an `IndexedSeq` instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, 
origin=scala.collection.mutable.SeqOps.transform, version=2.13.0
[warn]       centers.transform(_ / numCoefficientSets)
[warn]               ^ {code}


> Clean up the deprecated API usage related to `SeqOps`
> -
>
> Key: SPARK-45684
> URL: https://issues.apache.org/jira/browse/SPARK-45684
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> * method transform in trait SeqOps is deprecated (since 2.13.0)
>  * method reverseMap in trait SeqOps is deprecated (since 2.13.0)
>  * method retain in trait SetOps is deprecated (since 2.13.0)
>  * method union in trait SeqOps is deprecated (since 2.13.0)
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15:
>  method transform in trait SeqOps is deprecated (since 2.13.0): Use 
> `mapInPlace` on an `IndexedSeq` instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, 
> origin=scala.collection.mutable.SeqOps.transform, version=2.13.0
> [warn]       centers.transform(_ / numCoefficientSets)
> [warn]               ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45700) Fix `The outer reference in this type test cannot be checked at run time`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45700:


 Summary: Fix `The outer reference in this type test cannot be 
checked at run time`
 Key: SPARK-45700
 URL: https://issues.apache.org/jira/browse/SPARK-45700
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:324:12:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.createScalaTestCase
[error]       case udfTestCase: UDFTest
[error]            ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:506:12:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
[error]       case udfTestCase: UDFTest =>
[error]            ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:508:12:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
[error]       case udtfTestCase: UDTFSetTest =>
[error]            ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:514:13:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
[error]       case _: PgSQLTest =>
[error]             ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:522:13:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
[error]       case _: AnsiTest =>
[error]             ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:524:13:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
[error]       case _: TimestampNTZTest =>
[error]             ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:584:12:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue
[error]       case udfTestCase: UDFTest
[error]            ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:596:12:
 The outer reference in this type test cannot be checked at run time.
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, 
site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue
[error]       case udtfTestCase: UDTFSetTest
[error]            ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45699) Fix "Widening conversion from `OType` to `NType` is deprecated because it loses precision. Write `.toXX` instead"

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45699:


 Summary: Fix "Widening conversion from `OType` to `NType` is 
deprecated because it loses precision. Write `.toXX` instead"
 Key: SPARK-45699
 URL: https://issues.apache.org/jira/browse/SPARK-45699
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
 Widening conversion from Long to Double is deprecated because it loses 
precision. Write `.toDouble` instead. [quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
[error]       val threshold = max(speculationMultiplier * medianDuration, 
minTimeToSpeculation)
[error]                                                                   ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
 Widening conversion from Long to Double is deprecated because it loses 
precision. Write `.toDouble` instead. [quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
[error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
customizedThreshold = true)
[error]                                                            ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
 Widening conversion from Int to Float is deprecated because it loses 
precision. Write `.toFloat` instead. [quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
[error]   override def getFloat(i: Int): Float = getInt(i)
[error]                                                ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
 Widening conversion from Long to Float is deprecated because it loses 
precision. Write `.toFloat` instead. [quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
[error]   override def getFloat(i: Int): Float = getLong(i)
[error]                                                 ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
 Widening conversion from Long to Double is deprecated because it loses 
precision. Write `.toDouble` instead. [quickfixable]
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
[error]   override def getDouble(i: Int): Double = getLong(i)
[error]                                                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-10-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780150#comment-17780150
 ] 

Yang Jie commented on SPARK-45687:
--

We need to distinguish the situations, some need to be changed to 
`.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray`

> Fix `Passing an explicit array value to a Scala varargs method is deprecated`
> -
>
> Key: SPARK-45687
> URL: https://issues.apache.org/jira/browse/SPARK-45687
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0
> [warn]         df.agg(udaf(allColumns: _*)),
> [warn]                     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]                                                ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, 
> aggFunctions.tail: _*),
> [warn]                                                                        
>     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]           df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45686) Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`

2023-10-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780149#comment-17780149
 ] 

Yang Jie commented on SPARK-45686:
--

We need to distinguish the situations, some need to be changed to 
`.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray`
 
 
 
 
 
 
 
 

> Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated`
> 
>
> Key: SPARK-45686
> URL: https://issues.apache.org/jira/browse/SPARK-45686
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(s1.indices, s1.values, s2.indices, 
> s2.values)
> [error]                               ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:54:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(s1.indices, s1.values, s2.indices, 
> s2.values)
> [error]                                                      ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:59:31:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(s1.indices, s1.values, 0 until d1.size, 
> d1.values)
> [error]                               ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:61:59:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(0 until d1.size, d1.values, s1.indices, 
> s1.values)
> [error]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-45686) Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`

2023-10-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780149#comment-17780149
 ] 

Yang Jie edited comment on SPARK-45686 at 10/27/23 3:21 AM:


We need to distinguish the situations, some need to be changed to 
`.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray`


was (Author: luciferyang):
We need to distinguish the situations, some need to be changed to 
`.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray`
 
 
 
 
 
 
 
 

> Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated`
> 
>
> Key: SPARK-45686
> URL: https://issues.apache.org/jira/browse/SPARK-45686
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(s1.indices, s1.values, s2.indices, 
> s2.values)
> [error]                               ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:54:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(s1.indices, s1.values, s2.indices, 
> s2.values)
> [error]                                                      ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:59:31:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(s1.indices, s1.values, 0 until d1.size, 
> d1.values)
> [error]                               ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:61:59:
>  method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
> deprecated (since 2.13.0): implicit conversions from Array to 
> immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` 
> explicitly if you want to copy, or use the more efficient non-copying 
> ArraySeq.unsafeWrapArray
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.ml.linalg.Vector.equals, 
> origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
> version=2.13.0
> [error]             Vectors.equals(0 until d1.size, d1.values, s1.indices, 
> s1.values)
> [error]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45314) Drop Scala 2.12 and make Scala 2.13 by default

2023-10-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780148#comment-17780148
 ] 

Yang Jie commented on SPARK-45314:
--

Friendly ping [~ivoson] [~panbingkun] [~zhiyuan] [~laglangyue], I has created 
some tickets here, feel free to pick up them if you are interested ~

> Drop Scala 2.12 and make Scala 2.13 by default
> --
>
> Key: SPARK-45314
> URL: https://issues.apache.org/jira/browse/SPARK-45314
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Yang Jie
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45698) Clean up the deprecated API usage related to `Buffer`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45698:


 Summary: Clean up the deprecated API usage related to `Buffer`
 Key: SPARK-45698
 URL: https://issues.apache.org/jira/browse/SPARK-45698
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


* method append in trait Buffer is deprecated (since 2.13.0)
 * method prepend in trait Buffer is deprecated (since 2.13.0)
 * method trimEnd in trait Buffer is deprecated (since 2.13.4)
 * method trimStart in trait Buffer is deprecated (since 2.13.4)

{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/deploy/IvyTestUtils.scala:319:18:
 method append in trait Buffer is deprecated (since 2.13.0): Use appendAll 
instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.deploy.IvyTestUtils.createLocalRepository, 
origin=scala.collection.mutable.Buffer.append, version=2.13.0
[warn]         allFiles.append(rFiles: _*)
[warn]                  ^ 

[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:183:13:
 method trimEnd in trait Buffer is deprecated (since 2.13.4): use 
dropRightInPlace instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.util.SizeEstimator.SearchState.dequeue, 
origin=scala.collection.mutable.Buffer.trimEnd, version=2.13.4
[warn]       stack.trimEnd(1)
[warn]             ^{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45697) Fix `Unicode escapes in triple quoted strings are deprecated`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45697:


 Summary: Fix `Unicode escapes in triple quoted strings are 
deprecated`
 Key: SPARK-45697
 URL: https://issues.apache.org/jira/browse/SPARK-45697
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala:1686:44:
 Unicode escapes in triple quoted strings are deprecated; use the literal 
character instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, version=2.13.2
[warn]         |  COLLECTION ITEMS TERMINATED BY '\u0002'
[warn]                                            ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45667) Clean up the deprecated API usage related to `IterableOnceExtensionMethods`.

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45667.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43532
[https://github.com/apache/spark/pull/43532]

> Clean up the deprecated API usage related to `IterableOnceExtensionMethods`.
> 
>
> Key: SPARK-45667
> URL: https://issues.apache.org/jira/browse/SPARK-45667
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45667) Clean up the deprecated API usage related to `IterableOnceExtensionMethods`.

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45667:


Assignee: Yang Jie

> Clean up the deprecated API usage related to `IterableOnceExtensionMethods`.
> 
>
> Key: SPARK-45667
> URL: https://issues.apache.org/jira/browse/SPARK-45667
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45481.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43308
[https://github.com/apache/spark/pull/43308]

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45696) Fix `method tryCompleteWith in trait Promise is deprecated`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45696:


 Summary: Fix `method tryCompleteWith in trait Promise is 
deprecated`
 Key: SPARK-45696
 URL: https://issues.apache.org/jira/browse/SPARK-45696
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/FutureAction.scala:190:32:
 method tryCompleteWith in trait Promise is deprecated (since 2.13.0): Since 
this method is semantically equivalent to `completeWith`, use that instead.
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.ComplexFutureAction.p, 
origin=scala.concurrent.Promise.tryCompleteWith, version=2.13.0
[warn]   private val p = Promise[T]().tryCompleteWith(run(jobSubmitter))
[warn]                                ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45694:


 Summary: Fix `method signum in trait ScalaNumberProxy is 
deprecated`
 Key: SPARK-45694
 URL: https://issues.apache.org/jira/browse/SPARK-45694
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
 method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
`sign` method instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
 origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
[warn]       val uc = useCount.signum
[warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45691) Clean up the deprecated API usage related to `RightProjection/LeftProjection/Either`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45691:


 Summary:   Clean up the deprecated API usage related to 
`RightProjection/LeftProjection/Either`
 Key: SPARK-45691
 URL: https://issues.apache.org/jira/browse/SPARK-45691
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


* method get in class RightProjection is deprecated (since 2.13.0)
 * method get in class LeftProjection is deprecated (since 2.13.0)
 * method right in class Either is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupBasedRowLevelOperationScanPlanning.scala:54:28:
 method get in class LeftProjection is deprecated (since 2.13.0): use 
`Either.swap.getOrElse` instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.execution.datasources.v2.GroupBasedRowLevelOperationScanPlanning.apply,
 origin=scala.util.Either.LeftProjection.get, version=2.13.0
[warn]         pushedFilters.left.get.mkString(", ")
[warn]                            ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45690) Clean up type use of `BufferedIterator/CanBuildFrom/Traversable`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45690:


 Summary: Clean up type use of 
`BufferedIterator/CanBuildFrom/Traversable`
 Key: SPARK-45690
 URL: https://issues.apache.org/jira/browse/SPARK-45690
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie


* type BufferedIterator in package scala is deprecated (since 2.13.0)
 * type CanBuildFrom in package generic is deprecated (since 2.13.0)
 * type Traversable in package scala is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala:67:12:
 type BufferedIterator in package scala is deprecated (since 2.13.0): Use 
scala.collection.BufferedIterator instead of scala.BufferedIterator
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.execution.GroupedIterator.input, 
origin=scala.BufferedIterator, version=2.13.0
[warn]     input: BufferedIterator[InternalRow],
[warn]            ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45689) Clean up the deprecated API usage related to `StringContext/StringOps`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45689:


 Summary:   Clean up the deprecated API usage related to 
`StringContext/StringOps`
 Key: SPARK-45689
 URL: https://issues.apache.org/jira/browse/SPARK-45689
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala:258:30:
 method treatEscapes in object StringContext is deprecated (since 2.13.0): use 
processEscapes
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.catalyst.expressions.codegen.Block.foldLiteralArgs, 
origin=scala.StringContext.treatEscapes, version=2.13.0
[warn]     buf.append(StringContext.treatEscapes(strings.next()))
[warn]                              ^
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala:270:32:
 method treatEscapes in object StringContext is deprecated (since 2.13.0): use 
processEscapes
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.catalyst.expressions.codegen.Block.foldLiteralArgs, 
origin=scala.StringContext.treatEscapes, version=2.13.0
[warn]       buf.append(StringContext.treatEscapes(strings.next()))
[warn]   {code}
 

 
 * method checkLengths in class StringContext is deprecated (since 2.13.0)
 * method treatEscapes in object StringContext is deprecated (since 2.13.0)
 * method replaceAllLiterally in class StringOps is deprecated (since 2.13.2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45688) Clean up the deprecated API usage related to `MapOps`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45688:


 Summary: Clean up the deprecated API usage related to `MapOps`
 Key: SPARK-45688
 URL: https://issues.apache.org/jira/browse/SPARK-45688
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


* method - in trait MapOps is deprecated (since 2.13.0)
 * method -- in trait MapOps is deprecated (since 2.13.0)
 * method + in trait MapOps is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala:84:27:
 method + in trait MapOps is deprecated (since 2.13.0): Consider requiring an 
immutable Map or fall back to Map.concat.
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.deploy.worker.CommandUtils.buildLocalCommand.newEnvironment,
 origin=scala.collection.MapOps.+, version=2.13.0
[warn]       command.environment + ((libraryPathName, 
libraryPaths.mkString(File.pathSeparator)))
[warn]                           ^
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala:91:22:
 method + in trait MapOps is deprecated (since 2.13.0): Consider requiring an 
immutable Map or fall back to Map.concat.
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.deploy.worker.CommandUtils.buildLocalCommand, 
origin=scala.collection.MapOps.+, version=2.13.0
[warn]       newEnvironment += (SecurityManager.ENV_AUTH_SECRET -> 
securityMgr.getSecretKey())
[warn]                      ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45687:


 Summary: Fix `Passing an explicit array value to a Scala varargs 
method is deprecated`
 Key: SPARK-45687
 URL: https://issues.apache.org/jira/browse/SPARK-45687
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


Passing an explicit array value to a Scala varargs method is deprecated (since 
2.13.0) and will result in a defensive copy; Use the more efficient non-copying 
ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21:
 Passing an explicit array value to a Scala varargs method is deprecated (since 
2.13.0) and will result in a defensive copy; Use the more efficient non-copying 
ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0
[warn]         df.agg(udaf(allColumns: _*)),
[warn]                     ^
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48:
 Passing an explicit array value to a Scala varargs method is deprecated (since 
2.13.0) and will result in a defensive copy; Use the more efficient non-copying 
ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
version=2.13.0
[warn]         df.agg(aggFunctions.head, aggFunctions.tail: _*),
[warn]                                                ^
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76:
 Passing an explicit array value to a Scala varargs method is deprecated (since 
2.13.0) and will result in a defensive copy; Use the more efficient non-copying 
ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
version=2.13.0
[warn]         df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, 
aggFunctions.tail: _*),
[warn]                                                                          
  ^
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50:
 Passing an explicit array value to a Scala varargs method is deprecated (since 
2.13.0) and will result in a defensive copy; Use the more efficient non-copying 
ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
version=2.13.0
[warn]           df.agg(aggFunctions.head, aggFunctions.tail: _*),
[warn]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45686) Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45686:


 Summary: Fix `method copyArrayToImmutableIndexedSeq in class 
LowPriorityImplicits2 is deprecated`
 Key: SPARK-45686
 URL: https://issues.apache.org/jira/browse/SPARK-45686
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31:
 method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
deprecated (since 2.13.0): implicit conversions from Array to 
immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly 
if you want to copy, or use the more efficient non-copying 
ArraySeq.unsafeWrapArray
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.linalg.Vector.equals, 
origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
version=2.13.0
[error]             Vectors.equals(s1.indices, s1.values, s2.indices, s2.values)
[error]                               ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:54:
 method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
deprecated (since 2.13.0): implicit conversions from Array to 
immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly 
if you want to copy, or use the more efficient non-copying 
ArraySeq.unsafeWrapArray
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.linalg.Vector.equals, 
origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
version=2.13.0
[error]             Vectors.equals(s1.indices, s1.values, s2.indices, s2.values)
[error]                                                      ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:59:31:
 method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
deprecated (since 2.13.0): implicit conversions from Array to 
immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly 
if you want to copy, or use the more efficient non-copying 
ArraySeq.unsafeWrapArray
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.linalg.Vector.equals, 
origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
version=2.13.0
[error]             Vectors.equals(s1.indices, s1.values, 0 until d1.size, 
d1.values)
[error]                               ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:61:59:
 method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is 
deprecated (since 2.13.0): implicit conversions from Array to 
immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly 
if you want to copy, or use the more efficient non-copying 
ArraySeq.unsafeWrapArray
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.linalg.Vector.equals, 
origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, 
version=2.13.0
[error]             Vectors.equals(0 until d1.size, d1.values, s1.indices, 
s1.values)
[error]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45685) Use `LazyList` instead of `Stream`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45685:


 Summary: Use `LazyList` instead of `Stream`
 Key: SPARK-45685
 URL: https://issues.apache.org/jira/browse/SPARK-45685
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


* class Stream in package immutable is deprecated (since 2.13.0)object Stream in
 * package immutable is deprecated (since 2.13.0)
 * type Stream in package scala is deprecated (since 2.13.0)
 * value Stream in package scala is deprecated (since 2.13.0)
 * method append in class Stream is deprecated (since 2.13.0)
 * method toStream in trait IterableOnceOps is deprecated (since 2.13.0)

 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20:
 class Stream in package immutable is deprecated (since 2.13.0): Use LazyList 
(which is fully lazy) instead of Stream (which has a lazy tail only)
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, 
origin=scala.collection.immutable.Stream, version=2.13.0
[warn]     val stream: () => Stream[T])
[warn]                    ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45575) support time travel options for df read API

2023-10-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-45575.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43403
[https://github.com/apache/spark/pull/43403]

> support time travel options for df read API
> ---
>
> Key: SPARK-45575
> URL: https://issues.apache.org/jira/browse/SPARK-45575
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45575) support time travel options for df read API

2023-10-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-45575:


Assignee: Wenchen Fan

> support time travel options for df read API
> ---
>
> Key: SPARK-45575
> URL: https://issues.apache.org/jira/browse/SPARK-45575
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45684) Clean up the deprecated API usage related to `SeqOps`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45684:


 Summary: Clean up the deprecated API usage related to `SeqOps`
 Key: SPARK-45684
 URL: https://issues.apache.org/jira/browse/SPARK-45684
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


* method transform in trait SeqOps is deprecated (since 2.13.0)
 * method reverseMap in trait SeqOps is deprecated (since 2.13.0)
 * method retain in trait SetOps is deprecated (since 2.13.0)
 * method - in trait SetOps is deprecated (since 2.13.0)
 * method -- in trait SetOps is deprecated (since 2.13.0)
 * method + in trait SetOps is deprecated (since 2.13.0)

{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15:
 method transform in trait SeqOps is deprecated (since 2.13.0): Use 
`mapInPlace` on an `IndexedSeq` instead
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, 
origin=scala.collection.mutable.SeqOps.transform, version=2.13.0
[warn]       centers.transform(_ / numCoefficientSets)
[warn]               ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45681) Clone a js version of UIUtils.errorMessageCell to for consistent error parsing on UI

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45681:
---
Labels: pull-request-available  (was: )

> Clone a js version of UIUtils.errorMessageCell to for consistent error 
> parsing on UI
> 
>
> Key: SPARK-45681
> URL: https://issues.apache.org/jira/browse/SPARK-45681
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45682) Fix "method + in class Byte/Short/Char/Long/Double/Int is deprecated"

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45682:
-
Description: 
{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:127:42:
 method + in class Int is deprecated (since 2.13.0): Adding a number and a 
String is deprecated. Use the string interpolation `s"$num$str"`
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.rdd.PipedRDDSuite, 
origin=scala.Int.+, version=2.13.0
[warn]       (i: Int, f: String => Unit) => f(i + "_")) {code}

> Fix   "method + in class Byte/Short/Char/Long/Double/Int is deprecated"
> ---
>
> Key: SPARK-45682
> URL: https://issues.apache.org/jira/browse/SPARK-45682
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:127:42:
>  method + in class Int is deprecated (since 2.13.0): Adding a number and a 
> String is deprecated. Use the string interpolation `s"$num$str"`
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, site=org.apache.spark.rdd.PipedRDDSuite, 
> origin=scala.Int.+, version=2.13.0
> [warn]       (i: Int, f: String => Unit) => f(i + "_")) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45683) Fix `method any2stringadd in object Predef is deprecated`

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45683:


 Summary: Fix `method any2stringadd in object Predef is deprecated`
 Key: SPARK-45683
 URL: https://issues.apache.org/jira/browse/SPARK-45683
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[warn] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:720:17:
 method any2stringadd in object Predef is deprecated (since 2.13.0): Implicit 
injection of + is deprecated. Convert to String to call +
[warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen.nullSafeEval,
 origin=scala.Predef.any2stringadd, version=2.13.0
[warn]         leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) {
[warn]                 ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45682) Fix "method + in class Byte/Short/Char/Long/Double/Int is deprecated"

2023-10-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45682:


 Summary: Fix   "method + in class Byte/Short/Char/Long/Double/Int 
is deprecated"
 Key: SPARK-45682
 URL: https://issues.apache.org/jira/browse/SPARK-45682
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45681) Clone a js version of UIUtils.errorMessageCell to for consistent error parsing on UI

2023-10-26 Thread Kent Yao (Jira)
Kent Yao created SPARK-45681:


 Summary: Clone a js version of UIUtils.errorMessageCell to for 
consistent error parsing on UI
 Key: SPARK-45681
 URL: https://issues.apache.org/jira/browse/SPARK-45681
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.5.0, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45679) Add clusterBy in DataFrame API

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45679.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43544
[https://github.com/apache/spark/pull/43544]

> Add clusterBy in DataFrame API
> --
>
> Key: SPARK-45679
> URL: https://issues.apache.org/jira/browse/SPARK-45679
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add clusterBy to Dataframe API e.g. in python
> DataframeWriterV1
> ```
> df.write
>   .format("delta")
>   .clusterBy("clusteringColumn1", "clusteringColumn2")
>   .save(...) or saveAsTable(...)
> ```
> DataFrameWriterV2
> ```
> df.writeTo(...).using("delta")
>   .clusterBy("clusteringColumn1", "clusteringColumn2")
>   .create() or replace() or createOrReplace()
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45679) Add clusterBy in DataFrame API

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45679:


Assignee: Zhen Li

> Add clusterBy in DataFrame API
> --
>
> Key: SPARK-45679
> URL: https://issues.apache.org/jira/browse/SPARK-45679
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
>  Labels: pull-request-available
>
> Add clusterBy to Dataframe API e.g. in python
> DataframeWriterV1
> ```
> df.write
>   .format("delta")
>   .clusterBy("clusteringColumn1", "clusteringColumn2")
>   .save(...) or saveAsTable(...)
> ```
> DataFrameWriterV2
> ```
> df.writeTo(...).using("delta")
>   .clusterBy("clusteringColumn1", "clusteringColumn2")
>   .create() or replace() or createOrReplace()
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-45651) Snapshots of some packages are not published any more

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-45651:
--

Reverted in 
https://github.com/apache/spark/commit/df0262f29969fe40f53dee070a150f2bfe98484c

> Snapshots of some packages are not published any more
> -
>
> Key: SPARK-45651
> URL: https://issues.apache.org/jira/browse/SPARK-45651
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Snapshots of some packages are not been published anymore, e.g. 
> spark-sql_2.13-4.0.0 has not been published since Sep, 13th: 
> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/
> There have been some attempts to fix CI: SPARK-45535 SPARK-45536
> Assumption is that memory consumption during build exceeds the available 
> memory of the Github host.
> The following could be attempted:
> - enable manual trigger of the {{publish_snapshots.yml}} workflow
> - enable some memory use logging to proof that exceeded memory is the root 
> cause
> - attempt to reduce memory footprint and see impact in above logging
> - revert memory use logging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45651) Snapshots of some packages are not published any more

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45651.
--
Fix Version/s: (was: 4.0.0)
 Assignee: (was: Enrico Minack)
   Resolution: Invalid

> Snapshots of some packages are not published any more
> -
>
> Key: SPARK-45651
> URL: https://issues.apache.org/jira/browse/SPARK-45651
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Enrico Minack
>Priority: Major
>  Labels: pull-request-available
>
> Snapshots of some packages are not been published anymore, e.g. 
> spark-sql_2.13-4.0.0 has not been published since Sep, 13th: 
> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/
> There have been some attempts to fix CI: SPARK-45535 SPARK-45536
> Assumption is that memory consumption during build exceeds the available 
> memory of the Github host.
> The following could be attempted:
> - enable manual trigger of the {{publish_snapshots.yml}} workflow
> - enable some memory use logging to proof that exceeded memory is the root 
> cause
> - attempt to reduce memory footprint and see impact in above logging
> - revert memory use logging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-45651) Snapshots of some packages are not published any more

2023-10-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780127#comment-17780127
 ] 

Hyukjin Kwon edited comment on SPARK-45651 at 10/27/23 12:48 AM:
-

Reverted in 
https://github.com/apache/spark/commit/df0262f29969fe40f53dee070a150f2bfe98484c 
and 
https://github.com/apache/spark/commit/0d665fe8c87b037516f21162d2f5545580776af3


was (Author: gurwls223):
Reverted in 
https://github.com/apache/spark/commit/df0262f29969fe40f53dee070a150f2bfe98484c

> Snapshots of some packages are not published any more
> -
>
> Key: SPARK-45651
> URL: https://issues.apache.org/jira/browse/SPARK-45651
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Enrico Minack
>Priority: Major
>  Labels: pull-request-available
>
> Snapshots of some packages are not been published anymore, e.g. 
> spark-sql_2.13-4.0.0 has not been published since Sep, 13th: 
> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/
> There have been some attempts to fix CI: SPARK-45535 SPARK-45536
> Assumption is that memory consumption during build exceeds the available 
> memory of the Github host.
> The following could be attempted:
> - enable manual trigger of the {{publish_snapshots.yml}} workflow
> - enable some memory use logging to proof that exceeded memory is the root 
> cause
> - attempt to reduce memory footprint and see impact in above logging
> - revert memory use logging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-45651) Snapshots of some packages are not published any more

2023-10-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-45651:
--

> Snapshots of some packages are not published any more
> -
>
> Key: SPARK-45651
> URL: https://issues.apache.org/jira/browse/SPARK-45651
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Enrico Minack
>Priority: Major
>  Labels: pull-request-available
>
> Snapshots of some packages are not been published anymore, e.g. 
> spark-sql_2.13-4.0.0 has not been published since Sep, 13th: 
> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/
> There have been some attempts to fix CI: SPARK-45535 SPARK-45536
> Assumption is that memory consumption during build exceeds the available 
> memory of the Github host.
> The following could be attempted:
> - enable manual trigger of the {{publish_snapshots.yml}} workflow
> - enable some memory use logging to proof that exceeded memory is the root 
> cause
> - attempt to reduce memory footprint and see impact in above logging
> - revert memory use logging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45677) Observe API error logging

2023-10-26 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-45677.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43542
[https://github.com/apache/spark/pull/43542]

> Observe API error logging
> -
>
> Key: SPARK-45677
> URL: https://issues.apache.org/jira/browse/SPARK-45677
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should tell user why it's not supported and what to do
> [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45677) Observe API error logging

2023-10-26 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-45677:


Assignee: Wei Liu

> Observe API error logging
> -
>
> Key: SPARK-45677
> URL: https://issues.apache.org/jira/browse/SPARK-45677
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> We should tell user why it's not supported and what to do
> [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44386) Use PartitionEvaluator API in HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44386:
---
Labels: pull-request-available  (was: )

> Use PartitionEvaluator API in HashAggregateExec, ObjectHashAggregateExec, 
> SortAggregateExec
> ---
>
> Key: SPARK-44386
> URL: https://issues.apache.org/jira/browse/SPARK-44386
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Use PartitionEvaluator API in HashAggregateExec, ObjectHashAggregateExec, 
> SortAggregateExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32268) Bloom Filter Join

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-32268:
---
Labels: pull-request-available  (was: )

> Bloom Filter Join
> -
>
> Key: SPARK-32268
> URL: https://issues.apache.org/jira/browse/SPARK-32268
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yingyi Bu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.0
>
> Attachments: q16-bloom-filter.jpg, q16-default.jpg
>
>
> We can improve the performance of some joins by pre-filtering one side of a 
> join using a Bloom filter and IN predicate generated from the values from the 
> other side of the join.
>  For 
> example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql].
>  [Before this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007418/q16-default.jpg].
>  [After this 
> optimization|https://issues.apache.org/jira/secure/attachment/13007416/q16-bloom-filter.jpg].
> *Query Performance Benchmarks: TPC-DS Performance Evaluation*
>  Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and 
> Partitioned Parquet table
>  
> |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)|
> |tpcds q16|84|46|
> |tpcds q36|29|21|
> |tpcds q57|39|28|
> |tpcds q94|42|34|
> |tpcds q95|306|288|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44447) Use PartitionEvaluator API in FlatMapGroupsInPandasExec, FlatMapCoGroupsInPandasExec

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-7:
---
Labels: pull-request-available  (was: )

> Use PartitionEvaluator API in FlatMapGroupsInPandasExec, 
> FlatMapCoGroupsInPandasExec
> 
>
> Key: SPARK-7
> URL: https://issues.apache.org/jira/browse/SPARK-7
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>  Labels: pull-request-available
>
> Use PartitionEvaluator API in
> `FlatMapGroupsInPandasExec`
> `FlatMapCoGroupsInPandasExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44385) Use PartitionEvaluator API in MergingSessionsExec & UpdatingSessionsExec

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44385:
---
Labels: pull-request-available  (was: )

> Use PartitionEvaluator API in MergingSessionsExec & UpdatingSessionsExec
> 
>
> Key: SPARK-44385
> URL: https://issues.apache.org/jira/browse/SPARK-44385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Use PartitionEvaluator API in MergingSessionsExec & UpdatingSessionsExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44414) Fixed matching check for CharType/VarcharType

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44414:
---
Labels: pull-request-available  (was: )

> Fixed matching check for CharType/VarcharType
> -
>
> Key: SPARK-44414
> URL: https://issues.apache.org/jira/browse/SPARK-44414
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0, 3.4.0
>Reporter: caican
>Priority: Major
>  Labels: pull-request-available
>
> Running the following code throws an exception
> {code:java}
> val analyzer = getAnalyzer
> // check varchar type
> val json1 = "{\"__CHAR_VARCHAR_TYPE_STRING\":\"varchar(80)\"}"
> val metadata1 = new 
> MetadataBuilder().withMetadata(Metadata.fromJson(json1)).build()
> val query1 = TestRelation(StructType(Seq(
> StructField("x", StringType, metadata = metadata1),
> StructField("y", StringType, metadata = metadata1))).toAttributes)
> val table1 = TestRelation(StructType(Seq(
> StructField("x", StringType, metadata = metadata1),
> StructField("y", StringType, metadata = metadata1))).toAttributes)
> val parsedPlanByName1 = byName(table1, query1)
> analyzer.executeAndCheck(parsedPlanByName1, new QueryPlanningTracker()) {code}
>  
> Exception details are as follows
> {code:java}
> org.apache.spark.sql.AnalysisException: unresolved operator 'AppendData 
> TestRelation [x#8, y#9], true;
> 'AppendData TestRelation [x#8, y#9], true
> +- TestRelation [x#6, y#7]    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:52)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:51)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:156)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$47(CheckAnalysis.scala:704)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$47$adapted(CheckAnalysis.scala:702)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:186)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:702)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:92)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:156)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:177)
>     at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:174)
>     at 
> org.apache.spark.sql.catalyst.analysis.DataSourceV2AnalysisBaseSuite.$anonfun$new$36(DataSourceV2AnalysisSuite.scala:691)
>  {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, AttachDistributedSequenceExec

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44362:
---
Labels: pull-request-available  (was: )

> Use  PartitionEvaluator API in AggregateInPandasExec, 
> AttachDistributedSequenceExec
> ---
>
> Key: SPARK-44362
> URL: https://issues.apache.org/jira/browse/SPARK-44362
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>  Labels: pull-request-available
>
> Use  PartitionEvaluator API in
> `AggregateInPandasExec`
> `AttachDistributedSequenceExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45680) ReleaseSession to close Spark Connect session

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45680:
---
Labels: pull-request-available  (was: )

> ReleaseSession to close Spark Connect session
> -
>
> Key: SPARK-45680
> URL: https://issues.apache.org/jira/browse/SPARK-45680
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-10-26 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-43754:
--
Affects Version/s: 4.0.0

> Spark Connect Session & Query lifecycle
> ---
>
> Key: SPARK-43754
> URL: https://issues.apache.org/jira/browse/SPARK-43754
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, queries in Spark Connect are executed within the RPC handler.
> We want to detach the RPC interface from actual sessions and execution, so 
> that we can make the interface more flexible
>  * maintain long running sessions, independent of unbroken GRPC channel
>  * be able to cancel queries
>  * have different interfaces to query results than push from server



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45680) ReleaseSession to close Spark Connect session

2023-10-26 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-45680:
-

 Summary: ReleaseSession to close Spark Connect session
 Key: SPARK-45680
 URL: https://issues.apache.org/jira/browse/SPARK-45680
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45679) Add clusterBy in DataFrame API

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45679:
---
Labels: pull-request-available  (was: )

> Add clusterBy in DataFrame API
> --
>
> Key: SPARK-45679
> URL: https://issues.apache.org/jira/browse/SPARK-45679
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.1
>Reporter: Zhen Li
>Priority: Major
>  Labels: pull-request-available
>
> Add clusterBy to Dataframe API e.g. in python
> DataframeWriterV1
> ```
> df.write
>   .format("delta")
>   .clusterBy("clusteringColumn1", "clusteringColumn2")
>   .save(...) or saveAsTable(...)
> ```
> DataFrameWriterV2
> ```
> df.writeTo(...).using("delta")
>   .clusterBy("clusteringColumn1", "clusteringColumn2")
>   .create() or replace() or createOrReplace()
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45679) Add clusterBy in DataFrame API

2023-10-26 Thread Zhen Li (Jira)
Zhen Li created SPARK-45679:
---

 Summary: Add clusterBy in DataFrame API
 Key: SPARK-45679
 URL: https://issues.apache.org/jira/browse/SPARK-45679
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Zhen Li


Add clusterBy to Dataframe API e.g. in python

DataframeWriterV1
```
df.write
  .format("delta")
  .clusterBy("clusteringColumn1", "clusteringColumn2")
  .save(...) or saveAsTable(...)
```

DataFrameWriterV2
```
df.writeTo(...).using("delta")
  .clusterBy("clusteringColumn1", "clusteringColumn2")
  .create() or replace() or createOrReplace()
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45678) Cover BufferReleasingInputStream.available under tryOrFetchFailedException

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45678:
---
Labels: pull-request-available  (was: )

> Cover BufferReleasingInputStream.available under tryOrFetchFailedException
> --
>
> Key: SPARK-45678
> URL: https://issues.apache.org/jira/browse/SPARK-45678
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Priority: Minor
>  Labels: pull-request-available
>
> We have encountered shuffle data corruption issue:
> ```
> Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:112)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:504)
>   at org.xerial.snappy.Snappy.uncompress(Snappy.java:543)
>   at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:450)
>   at 
> org.xerial.snappy.SnappyInputStream.available(SnappyInputStream.java:497)
>   at 
> org.apache.spark.storage.BufferReleasingInputStream.available(ShuffleBlockFetcherIterator.scala:1356)
>  ```
> Spark shuffle has capacity to detect corruption for a few stream op like 
> `read` and `skip`, such `IOException` in the stack trace will be rethrown as 
> `FetchFailedException` that will re-try the failed shuffle task. But in the 
> stack trace it is `available` that is not covered by the mechanism. So 
> no-retry has been happened and the Spark application just failed.
> As the `available` op will also involve data decompression, we should be able 
> to check it like `read` and `skip` do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45678) Cover BufferReleasingInputStream.available under tryOrFetchFailedException

2023-10-26 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-45678:
---

 Summary: Cover BufferReleasingInputStream.available under 
tryOrFetchFailedException
 Key: SPARK-45678
 URL: https://issues.apache.org/jira/browse/SPARK-45678
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: L. C. Hsieh


We have encountered shuffle data corruption issue:

```
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:112)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:504)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:543)
at 
org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:450)
at 
org.xerial.snappy.SnappyInputStream.available(SnappyInputStream.java:497)
at 
org.apache.spark.storage.BufferReleasingInputStream.available(ShuffleBlockFetcherIterator.scala:1356)
 ```

Spark shuffle has capacity to detect corruption for a few stream op like `read` 
and `skip`, such `IOException` in the stack trace will be rethrown as 
`FetchFailedException` that will re-try the failed shuffle task. But in the 
stack trace it is `available` that is not covered by the mechanism. So no-retry 
has been happened and the Spark application just failed.

As the `available` op will also involve data decompression, we should be able 
to check it like `read` and `skip` do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45652) SPJ: Handle empty input partitions after dynamic filtering

2023-10-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45652:
--
Fix Version/s: 3.4.2
   3.5.1

> SPJ: Handle empty input partitions after dynamic filtering
> --
>
> Key: SPARK-45652
> URL: https://issues.apache.org/jira/browse/SPARK-45652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> When the number of input partitions become 0 after dynamic filtering, in 
> {{BatchScanExec}}, currently SPJ will fail with error:
> {code}
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:529)
>   at scala.None$.get(Option.scala:527)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD$lzycompute(BosonBatchScanExec.scala:28)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD(BosonBatchScanExec.scala:28)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.doExecuteColumnar(BosonBatchScanExec.scala:33)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> {code}
> This is because {{groupPartitions}} will return {{None}} for this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45677) Observe API error logging

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45677:
---
Labels: pull-request-available  (was: )

> Observe API error logging
> -
>
> Key: SPARK-45677
> URL: https://issues.apache.org/jira/browse/SPARK-45677
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> We should tell user why it's not supported and what to do
> [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45677) Observe API error logging

2023-10-26 Thread Wei Liu (Jira)
Wei Liu created SPARK-45677:
---

 Summary: Observe API error logging
 Key: SPARK-45677
 URL: https://issues.apache.org/jira/browse/SPARK-45677
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


We should tell user why it's not supported and what to do

[https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45544) [CORE] Integrate SSL support into TransportContext

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45544:
---
Labels: pull-request-available  (was: )

> [CORE] Integrate SSL support into TransportContext
> --
>
> Key: SPARK-45544
> URL: https://issues.apache.org/jira/browse/SPARK-45544
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> Integrate the SSL support into TransportContext so that Spark can use RPC SSL 
> support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45652) SPJ: Handle empty input partitions after dynamic filtering

2023-10-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-45652.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43531
[https://github.com/apache/spark/pull/43531]

> SPJ: Handle empty input partitions after dynamic filtering
> --
>
> Key: SPARK-45652
> URL: https://issues.apache.org/jira/browse/SPARK-45652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When the number of input partitions become 0 after dynamic filtering, in 
> {{BatchScanExec}}, currently SPJ will fail with error:
> {code}
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:529)
>   at scala.None$.get(Option.scala:527)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD$lzycompute(BosonBatchScanExec.scala:28)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD(BosonBatchScanExec.scala:28)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.doExecuteColumnar(BosonBatchScanExec.scala:33)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> {code}
> This is because {{groupPartitions}} will return {{None}} for this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45652) SPJ: Handle empty input partitions after dynamic filtering

2023-10-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-45652:


Assignee: Chao Sun

> SPJ: Handle empty input partitions after dynamic filtering
> --
>
> Key: SPARK-45652
> URL: https://issues.apache.org/jira/browse/SPARK-45652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>
> When the number of input partitions become 0 after dynamic filtering, in 
> {{BatchScanExec}}, currently SPJ will fail with error:
> {code}
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:529)
>   at scala.None$.get(Option.scala:527)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136)
>   at 
> org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD$lzycompute(BosonBatchScanExec.scala:28)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD(BosonBatchScanExec.scala:28)
>   at 
> org.apache.spark.sql.boson.BosonBatchScanExec.doExecuteColumnar(BosonBatchScanExec.scala:33)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> {code}
> This is because {{groupPartitions}} will return {{None}} for this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45596) Use java.lang.ref.Cleaner instead of org.apache.spark.sql.connect.client.util.Cleaner

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45596:


Assignee: Min Zhao

> Use java.lang.ref.Cleaner instead of 
> org.apache.spark.sql.connect.client.util.Cleaner
> -
>
> Key: SPARK-45596
> URL: https://issues.apache.org/jira/browse/SPARK-45596
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Min Zhao
>Assignee: Min Zhao
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-10-19-02-25-57-966.png
>
>
> Now, we have updated JDK to 17,  so should replace this class by 
> [[java.lang.ref.Cleaner]].
>  
> !image-2023-10-19-02-25-57-966.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45596) Use java.lang.ref.Cleaner instead of org.apache.spark.sql.connect.client.util.Cleaner

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45596.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43439
[https://github.com/apache/spark/pull/43439]

> Use java.lang.ref.Cleaner instead of 
> org.apache.spark.sql.connect.client.util.Cleaner
> -
>
> Key: SPARK-45596
> URL: https://issues.apache.org/jira/browse/SPARK-45596
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Min Zhao
>Assignee: Min Zhao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-10-19-02-25-57-966.png
>
>
> Now, we have updated JDK to 17,  so should replace this class by 
> [[java.lang.ref.Cleaner]].
>  
> !image-2023-10-19-02-25-57-966.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45637) Time window aggregation in separate streams followed by stream-stream join not returning results

2023-10-26 Thread Andrzej Zera (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Zera updated SPARK-45637:
-
Description: 
According to documentation update (SPARK-42591) resulting from SPARK-42376, 
Spark 3.5.0 should support time-window aggregations in two separate streams 
followed by stream-stream window join:

https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995

However, I failed to reproduce this example and the query I built doesn't 
return any results:
{code:java}
from pyspark.sql.functions import rand
from pyspark.sql.functions import expr, window, window_time

spark.conf.set("spark.sql.shuffle.partitions", "1")

impressions = (
spark
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()
.selectExpr("value AS adId", "timestamp AS impressionTime")
)

impressionsWithWatermark = impressions \
.selectExpr("adId AS impressionAdId", "impressionTime") \
.withWatermark("impressionTime", "10 seconds")

clicks = (  
spark  
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()  
.where((rand() * 100).cast("integer") < 10)  # 10 out of every 100 
impressions result in a click  
.selectExpr("(value - 10) AS adId ", "timestamp AS clickTime")  # -10 so 
that a click with same id as impression is generated later (i.e. delayed data). 
 .where("adId > 0")
) 

clicksWithWatermark = clicks \
.selectExpr("adId AS clickAdId", "clickTime") \
.withWatermark("clickTime", "10 seconds")

clicksWindow = clicksWithWatermark.groupBy(  
window(clicksWithWatermark.clickTime, "1 minute")
).count()

impressionsWindow = impressionsWithWatermark.groupBy(
window(impressionsWithWatermark.impressionTime, "1 minute")
).count()

clicksAndImpressions = clicksWindow.join(impressionsWindow, "window", "inner")

clicksAndImpressions.writeStream \
.format("memory") \
.queryName("clicksAndImpressions") \
.outputMode("append") \
.start() {code}
 

My intuition is that I'm getting no results because to output results of the 
first stateful operator (time window aggregation), a watermark needs to pass 
the end timestamp of the window. And once the watermark is after the end 
timestamp of the window, this window is ignored at the second stateful operator 
(stream-stream) join because it's behind the watermark. Indeed, a small hack 
done to event time column (adding one minute) between two stateful operators 
makes it possible to get results:
{code:java}
clicksWindow2 = clicksWithWatermark.groupBy( 
window(clicksWithWatermark.clickTime, "1 minute")
).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 
MINUTE')).drop("window")

impressionsWindow2 = impressionsWithWatermark.groupBy(
window(impressionsWithWatermark.impressionTime, "1 minute")
).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 
MINUTE')).drop("window")

clicksAndImpressions2 = clicksWindow2.join(impressionsWindow2, "window_time", 
"inner")

clicksAndImpressions2.writeStream \
.format("memory") \
.queryName("clicksAndImpressions2") \
.outputMode("append") \
.start()  {code}
 

  was:
According to documentation update (SPARK-42591) resulting from SPARK-42376, 
Spark 3.5.0 should support time-window aggregations in two separate streams 
followed by stream-stream window join:

[https://github.com/HeartSaVioR/spark/blob/eb0b09f0f2b518915421365a61d1f3d7d58b4404/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995]

However, I failed to reproduce this example and the query I built doesn't 
return any results:
{code:java}
from pyspark.sql.functions import rand
from pyspark.sql.functions import expr, window, window_time

spark.conf.set("spark.sql.shuffle.partitions", "1")

impressions = (
spark
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()
.selectExpr("value AS adId", "timestamp AS impressionTime")
)

impressionsWithWatermark = impressions \
.selectExpr("adId AS impressionAdId", "impressionTime") \
.withWatermark("impressionTime", "10 seconds")

clicks = (  
spark  
.readStream.format("rate").option("rowsPerSecond", 
"5").option("numPartitions", "1").load()  
.where((rand() * 100).cast("integer") < 10)  # 10 out of every 100 
impressions result in a click  
.selectExpr("(value - 10) AS adId ", "timestamp AS clickTime")  # -10 so 
that a click with same id as impression is generated later (i.e. delayed data). 
 .where("adId > 0")
) 

clicksWithWatermark = clicks \
.selectExpr("adId AS clickAdId", "clickTime") \
.withWatermark("clickTime", "10 seconds")

clicksWindow = clicksWithWatermark.groupBy(  
window(clicksWithWatermark.clickTime, "1 minute")
).count()

impressi

[jira] [Resolved] (SPARK-45659) Add `since` field to Java API marked as `@Deprecated`.

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45659.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43522
[https://github.com/apache/spark/pull/43522]

> Add `since` field to Java API marked as `@Deprecated`.
> --
>
> Key: SPARK-45659
> URL: https://issues.apache.org/jira/browse/SPARK-45659
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, SS
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark 3.0.0:
> - SPARK-26861
>   - org.apache.spark.sql.expressions.javalang.typed
> - SPARK-27606
>   - org.apache.spark.sql.catalyst.expressions.ExpressionDescription#extended: 
>   - 
> org.apache.spark.sql.catalyst.expressions.ExpressionInfo#ExpressionInfo(String,
>  String, String, String, String)
> Spark 3.2.0
> - SPARK-33717
>   - 
> org.apache.spark.launcher.SparkLauncher#DEPRECATED_CHILD_CONNECTION_TIMEOUT
> - SPARK-33779
>   - org.apache.spark.sql.connector.write.WriteBuilder#buildForBatch
>   - org.apache.spark.sql.connector.write.WriteBuilder#buildForStreaming
> Spark 3.4.0
> - SPARK-39805
>   - org.apache.spark.sql.streaming.Trigger
> - SPARK-42398
>   - 
> org.apache.spark.sql.connector.catalog.TableCatalog#createTable(Identifier, 
> StructType, Transform[], Map) 
>   - 
> org.apache.spark.sql.connector.catalog.StagingTableCatalog#stageCreate(Identifier,
>  StructType, Transform[], Map)
>   - org.apache.spark.sql.connector.catalog.Table#schema
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45659) Add `since` field to Java API marked as `@Deprecated`.

2023-10-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45659:


Assignee: Yang Jie

> Add `since` field to Java API marked as `@Deprecated`.
> --
>
> Key: SPARK-45659
> URL: https://issues.apache.org/jira/browse/SPARK-45659
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, SS
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> Spark 3.0.0:
> - SPARK-26861
>   - org.apache.spark.sql.expressions.javalang.typed
> - SPARK-27606
>   - org.apache.spark.sql.catalyst.expressions.ExpressionDescription#extended: 
>   - 
> org.apache.spark.sql.catalyst.expressions.ExpressionInfo#ExpressionInfo(String,
>  String, String, String, String)
> Spark 3.2.0
> - SPARK-33717
>   - 
> org.apache.spark.launcher.SparkLauncher#DEPRECATED_CHILD_CONNECTION_TIMEOUT
> - SPARK-33779
>   - org.apache.spark.sql.connector.write.WriteBuilder#buildForBatch
>   - org.apache.spark.sql.connector.write.WriteBuilder#buildForStreaming
> Spark 3.4.0
> - SPARK-39805
>   - org.apache.spark.sql.streaming.Trigger
> - SPARK-42398
>   - 
> org.apache.spark.sql.connector.catalog.TableCatalog#createTable(Identifier, 
> StructType, Transform[], Map) 
>   - 
> org.apache.spark.sql.connector.catalog.StagingTableCatalog#stageCreate(Identifier,
>  StructType, Transform[], Map)
>   - org.apache.spark.sql.connector.catalog.Table#schema
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38723) Test the error class: CONCURRENT_QUERY

2023-10-26 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779871#comment-17779871
 ] 

Jungtaek Lim commented on SPARK-38723:
--

The merge script will assign the PR author. We keep the Jira ticket be 
unassigned till the PR gets merged.

> Test the error class: CONCURRENT_QUERY
> --
>
> Key: SPARK-38723
> URL: https://issues.apache.org/jira/browse/SPARK-38723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Philip Dakin
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Add at least one test for the error class *CONCURRENT_QUERY* to 
> QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def concurrentQueryInstanceError(): Throwable = {
> new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty)
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38723) Test the error class: CONCURRENT_QUERY

2023-10-26 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-38723:


Assignee: Philip Dakin

> Test the error class: CONCURRENT_QUERY
> --
>
> Key: SPARK-38723
> URL: https://issues.apache.org/jira/browse/SPARK-38723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Philip Dakin
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Add at least one test for the error class *CONCURRENT_QUERY* to 
> QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def concurrentQueryInstanceError(): Throwable = {
> new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty)
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38723) Test the error class: CONCURRENT_QUERY

2023-10-26 Thread Philip Dakin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779869#comment-17779869
 ] 

Philip Dakin commented on SPARK-38723:
--

[~kabhwan] it's me.

 

BTW - do you know steps to get the ability to assign things? Would have 
assigned to myself but I don't see the option.

> Test the error class: CONCURRENT_QUERY
> --
>
> Key: SPARK-38723
> URL: https://issues.apache.org/jira/browse/SPARK-38723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Add at least one test for the error class *CONCURRENT_QUERY* to 
> QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def concurrentQueryInstanceError(): Throwable = {
> new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty)
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0

2023-10-26 Thread Faiz Halde (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779843#comment-17779843
 ] 

Faiz Halde commented on SPARK-45598:


Hi, do we have any updates here? Happy to help

> Delta table 3.0.0 not working with Spark Connect 3.5.0
> --
>
> Key: SPARK-45598
> URL: https://issues.apache.org/jira/browse/SPARK-45598
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> Spark version 3.5.0
> Spark Connect version 3.5.0
> Delta table 3.0-rc2
> Spark connect server was started using
> *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages 
> org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf 
> 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}*
> {{Connect client depends on}}
> *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"*
> *and the connect libraries*
>  
> When trying to run a simple job that writes to a delta table
> {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{val data = spark.read.json("profiles.json")}}
> {{data.write.format("delta").save("/tmp/delta")}}
>  
> {{Error log in connect client}}
> {{Exception in thread "main" org.apache.spark.SparkException: 
> io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: 
> cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 
> in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}}
> {{    at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}}
> {{    at 
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{...}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}}
> {{    at scala.collection.Iterator.foreach(Iterator.scala:943)}}
> {{    at scala.collection.Iterator.foreach$(Iterator.scala:943)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}}
> {{    at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}}
> {{    at 
> scala.collection.mutable.Ar

[jira] [Assigned] (SPARK-45642) Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated`

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45642:
--

Assignee: (was: Apache Spark)

> Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated`
> --
>
> Key: SPARK-45642
> URL: https://issues.apache.org/jira/browse/SPARK-45642
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45642) Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated`

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45642:
--

Assignee: Apache Spark

> Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated`
> --
>
> Key: SPARK-45642
> URL: https://issues.apache.org/jira/browse/SPARK-45642
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45368:
--

Assignee: Apache Spark

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45368:
--

Assignee: (was: Apache Spark)

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45368:
--

Assignee: (was: Apache Spark)

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45368:
--

Assignee: Apache Spark

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Jiaan Geng  (was: Apache Spark)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported all the parquet compression codecs, but the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one due to Spark introduce a fake compression codecs none.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45676) Upgrade to PySpark 3.5.0 gives Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

2023-10-26 Thread Miles Granger (Jira)
Miles Granger created SPARK-45676:
-

 Summary: Upgrade to PySpark 3.5.0 gives Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
 Key: SPARK-45676
 URL: https://issues.apache.org/jira/browse/SPARK-45676
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Miles Granger


Using PySpark 3.4.1 w/ the following dependencies works fine for reading S3 
files:

hadoop-client:3.3.4
hadoop-common:3.3.4
hadoop-aws:3.3.4
aws-java-sdk-bundle:1.12.262

Doing a simple upgrade to PySpark 3.5.0 (which is still using hadoop 3.3.4 
AFAIK) results in failing to read the same S3 files:

```
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at 
org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(HadoopInputFile.java:44)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:76)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:450)
... 14 more
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45675) Specify number of partitions when creating spark dataframe from pandas dataframe

2023-10-26 Thread Jelmer Kuperus (Jira)
Jelmer Kuperus created SPARK-45675:
--

 Summary: Specify number of partitions when creating spark 
dataframe from pandas dataframe
 Key: SPARK-45675
 URL: https://issues.apache.org/jira/browse/SPARK-45675
 Project: Spark
  Issue Type: Improvement
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Jelmer Kuperus


When converting a large pandas dataframe to a spark dataframe like so


 
{code:java}
import pandas as pd pdf = pd.DataFrame([{"board_id": "3074457346698037360_0", 
"file_name": "board-content", "value": "A" * 119251} for i in range(0, 2)]) 
spark.createDataFrame(pdf).write.mode("overwrite").format("delta").saveAsTable("catalog.schema.table"){code}
 
 

You can encounter the following error

org.apache.spark.SparkException: Job aborted due to stage failure: Serialized 
task 11:1 was 366405365 bytes, which exceeds max allowed: 
spark.rpc.message.maxSize (268435456 bytes). Consider increasing 
spark.rpc.message.maxSize or using broadcast variables for large values.

 

As far as I can tell spark first converts the pandas dataframe into a python 
list and then constructs an rdd out of that list. which means that the 
parallelism is determined by the value of spark.sparkcontext.defaultparallelism 
and if the pandas dataframe is very large and the number of available cores is 
low then you end up with very large tasks that exceed the limits imposed on the 
size of tasks

 

Methods like spark.sparkContext.parallelize allow you to pass in the number of 
partitions of the resulting dataset. I think having a similar capability when 
creating a dataframe from a pandas dataframe makes a lot of sense. As right now 
I think the only workaround I can think of is changing the value of 
spark.default.parallelism but this is a system wide setting



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org