date:20240216

[jira] [Created] (SPARK-47080) Fix `HistoryServerSuite` to get `getNumJobs` in `eventually`

2024-02-16 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47080:
-

 Summary: Fix `HistoryServerSuite` to get `getNumJobs` in 
`eventually`
 Key: SPARK-47080
 URL: https://issues.apache.org/jira/browse/SPARK-47080
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47057) Reenable MyPy data test

2024-02-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47057.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45135
[https://github.com/apache/spark/pull/45135]

> Reenable MyPy data test
> ---
>
> Key: SPARK-47057
> URL: https://issues.apache.org/jira/browse/SPARK-47057
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47057) Reenable MyPy data test

2024-02-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47057:


Assignee: Hyukjin Kwon

> Reenable MyPy data test
> ---
>
> Key: SPARK-47057
> URL: https://issues.apache.org/jira/browse/SPARK-47057
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-16 Thread Desmond Cheong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Desmond Cheong updated SPARK-47079:
---
Description: 
Trying to create a dataframe containing a variant type results in:

AssertionError: Undefined error message parameter for error class: 
CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message parameter 
for error class: CANNOT_PARSE_DATATYPE. Parameters:

{'error': 'variant'}

"}

  was:Trying to create a dataframe containing a variant type results in 
`AssertionError: Undefined error message parameter for error class: 
CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message 
parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 
'variant'}"}`.


> Unable to create PySpark dataframe containing Variant columns
> -
>
> Key: SPARK-47079
> URL: https://issues.apache.org/jira/browse/SPARK-47079
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Desmond Cheong
>Priority: Major
>
> Trying to create a dataframe containing a variant type results in:
> AssertionError: Undefined error message parameter for error class: 
> CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message 
> parameter for error class: CANNOT_PARSE_DATATYPE. Parameters:
> {'error': 'variant'}
> "}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-16 Thread Desmond Cheong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Desmond Cheong updated SPARK-47079:
---
Description: 
Trying to create a dataframe containing a variant type results in
{{{}AssertionError: Undefined error message parameter for error class: 
CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message 
parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 
'variant'}"}{}}}.

> Unable to create PySpark dataframe containing Variant columns
> -
>
> Key: SPARK-47079
> URL: https://issues.apache.org/jira/browse/SPARK-47079
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Desmond Cheong
>Priority: Major
>
> Trying to create a dataframe containing a variant type results in
> {{{}AssertionError: Undefined error message parameter for error class: 
> CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message 
> parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 
> 'variant'}"}{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-16 Thread Desmond Cheong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Desmond Cheong updated SPARK-47079:
---
Description: Trying to create a dataframe containing a variant type results 
in `AssertionError: Undefined error message parameter for error class: 
CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message 
parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 
'variant'}"}`.  (was: Trying to create a dataframe containing a variant type 
results in
{{{}AssertionError: Undefined error message parameter for error class: 
CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message 
parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 
'variant'}"}{}}}.)

> Unable to create PySpark dataframe containing Variant columns
> -
>
> Key: SPARK-47079
> URL: https://issues.apache.org/jira/browse/SPARK-47079
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Desmond Cheong
>Priority: Major
>
> Trying to create a dataframe containing a variant type results in 
> `AssertionError: Undefined error message parameter for error class: 
> CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message 
> parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 
> 'variant'}"}`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-16 Thread Desmond Cheong (Jira)

Desmond Cheong created SPARK-47079:
--

 Summary: Unable to create PySpark dataframe containing Variant 
columns
 Key: SPARK-47079
 URL: https://issues.apache.org/jira/browse/SPARK-47079
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark, SQL
Affects Versions: 3.5.0
Reporter: Desmond Cheong






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread Holden Karau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-47077.
--
Resolution: Cannot Reproduce

After blowing away my maven + ivy cache it works fine – should have done that 
earlier.

> sbt build is broken due to selenium change
> --
>
> Key: SPARK-47077
> URL: https://issues.apache.org/jira/browse/SPARK-47077
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>  Labels: pull-request-available
>
> Building with sbt & JDK11 or 17 (executed after reload & clean 
> ;compile;catalyst/testOnly 
> org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite) results in
>  
> {code:java}
>  
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:8:
>  object WebDriver is not a member of package org.openqa.selenium
> [error] import org.openqa.selenium.WebDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:33:27:
>  not found: type WebDriver
> [error]   override var webDriver: WebDriver = _
> [error]                           ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:37:29:
>  Class org.openqa.selenium.remote.AbstractDriverOptions not found - 
> continuing with a stub.
> [error]     val chromeOptions = new ChromeOptions
> [error]                             ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:8:
>  object WebDriver is not a member of package org.openqa.selenium
> [error] import org.openqa.selenium.WebDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:43:27:
>  not found: type WebDriver
> [error]   implicit var webDriver: WebDriver
> [error]                           ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:39:21:
>  Class org.openqa.selenium.remote.RemoteWebDriver not found - continuing with 
> a stub.
> [error]     webDriver = new ChromeDriver(chromeOptions)
> [error]                     ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:28:
>  Unused import
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unused-imports, site=org.apache.spark.deploy.history
> [error] import org.openqa.selenium.WebDriver
> [error]                            ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:8:
>  object WebDriver is not a member of package org.openqa.selenium
> [error] import org.openqa.selenium.WebDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:29:
>  not found: type WebDriver
> [error]     implicit val webDriver: WebDriver = new HtmlUnitDriver
> [error]                             ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:37:8:
>  Class org.openqa.selenium.WebDriver not found - continuing with a stub.
> [error] import org.openqa.selenium.htmlunit.HtmlUnitDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:45:
>  Class org.openqa.selenium.Capabilities not found - continuing with a stub.
> [error]     implicit val webDriver: WebDriver = new HtmlUnitDriver
> [error]                                             ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:9:
>  Symbol 'type org.openqa.selenium.WebDriver' is missing from the classpath.
> [error] This symbol is required by 'value 
> org.scalatestplus.selenium.WebBrowser.go.driver'.
> [error] Make sure that type WebDriver is in your classpath and check for 
> conflicting dependencies with `-Ylog-classpath`.
> [error] A full rebuild may help if 'WebBrowser.class' was compiled against an 
> incompatible version of org.openqa.selenium.
> [error]         go to target.toExternalForm
> [error]         ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:12:
>  could not find implicit value for parameter driver: 
> org.openqa.selenium.WebDriver
> [error]

[jira] [Updated] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42285:
---
Labels: pull-request-available  (was: )

> Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
> inference on Parquet
> 
>
> Key: SPARK-42285
> URL: https://issues.apache.org/jira/browse/SPARK-42285
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
> inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled 
> which makes it impossible for TimestampNTZ writing when the flag is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47078) Documentation for SparkSession-based Profilers

2024-02-16 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-47078:


 Summary: Documentation for SparkSession-based Profilers
 Key: SPARK-47078
 URL: https://issues.apache.org/jira/browse/SPARK-47078
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47077:
---
Labels: pull-request-available  (was: )

> sbt build is broken due to selenium change
> --
>
> Key: SPARK-47077
> URL: https://issues.apache.org/jira/browse/SPARK-47077
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>  Labels: pull-request-available
>
> Building with sbt & JDK11 or 17 (executed after reload & clean 
> ;compile;catalyst/testOnly 
> org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite) results in
>  
> {code:java}
>  
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:8:
>  object WebDriver is not a member of package org.openqa.selenium
> [error] import org.openqa.selenium.WebDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:33:27:
>  not found: type WebDriver
> [error]   override var webDriver: WebDriver = _
> [error]                           ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:37:29:
>  Class org.openqa.selenium.remote.AbstractDriverOptions not found - 
> continuing with a stub.
> [error]     val chromeOptions = new ChromeOptions
> [error]                             ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:8:
>  object WebDriver is not a member of package org.openqa.selenium
> [error] import org.openqa.selenium.WebDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:43:27:
>  not found: type WebDriver
> [error]   implicit var webDriver: WebDriver
> [error]                           ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:39:21:
>  Class org.openqa.selenium.remote.RemoteWebDriver not found - continuing with 
> a stub.
> [error]     webDriver = new ChromeDriver(chromeOptions)
> [error]                     ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:28:
>  Unused import
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unused-imports, site=org.apache.spark.deploy.history
> [error] import org.openqa.selenium.WebDriver
> [error]                            ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:8:
>  object WebDriver is not a member of package org.openqa.selenium
> [error] import org.openqa.selenium.WebDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:29:
>  not found: type WebDriver
> [error]     implicit val webDriver: WebDriver = new HtmlUnitDriver
> [error]                             ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:37:8:
>  Class org.openqa.selenium.WebDriver not found - continuing with a stub.
> [error] import org.openqa.selenium.htmlunit.HtmlUnitDriver
> [error]        ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:45:
>  Class org.openqa.selenium.Capabilities not found - continuing with a stub.
> [error]     implicit val webDriver: WebDriver = new HtmlUnitDriver
> [error]                                             ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:9:
>  Symbol 'type org.openqa.selenium.WebDriver' is missing from the classpath.
> [error] This symbol is required by 'value 
> org.scalatestplus.selenium.WebBrowser.go.driver'.
> [error] Make sure that type WebDriver is in your classpath and check for 
> conflicting dependencies with `-Ylog-classpath`.
> [error] A full rebuild may help if 'WebBrowser.class' was compiled against an 
> incompatible version of org.openqa.selenium.
> [error]         go to target.toExternalForm
> [error]         ^
> [error] 
> /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:12:
>  could not find implicit value for parameter driver: 
> org.openqa.selenium.WebDriver
> [error]         go to target.toExternalForm
> [error]            ^
> [error] 
> /home/holde

[jira] [Created] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread Holden Karau (Jira)

Holden Karau created SPARK-47077:


 Summary: sbt build is broken due to selenium change
 Key: SPARK-47077
 URL: https://issues.apache.org/jira/browse/SPARK-47077
 Project: Spark
  Issue Type: Improvement
  Components: Build, Tests
Affects Versions: 4.0.0, 3.5.2
Reporter: Holden Karau
Assignee: Holden Karau


Building with sbt & JDK11 or 17 (executed after reload & clean 
;compile;catalyst/testOnly 
org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite) results in

 
{code:java}
 
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:8:
 object WebDriver is not a member of package org.openqa.selenium
[error] import org.openqa.selenium.WebDriver
[error]        ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:33:27:
 not found: type WebDriver
[error]   override var webDriver: WebDriver = _
[error]                           ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:37:29:
 Class org.openqa.selenium.remote.AbstractDriverOptions not found - continuing 
with a stub.
[error]     val chromeOptions = new ChromeOptions
[error]                             ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:8:
 object WebDriver is not a member of package org.openqa.selenium
[error] import org.openqa.selenium.WebDriver
[error]        ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:43:27:
 not found: type WebDriver
[error]   implicit var webDriver: WebDriver
[error]                           ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:39:21:
 Class org.openqa.selenium.remote.RemoteWebDriver not found - continuing with a 
stub.
[error]     webDriver = new ChromeDriver(chromeOptions)
[error]                     ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:28:
 Unused import
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unused-imports, site=org.apache.spark.deploy.history
[error] import org.openqa.selenium.WebDriver
[error]                            ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:8:
 object WebDriver is not a member of package org.openqa.selenium
[error] import org.openqa.selenium.WebDriver
[error]        ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:29:
 not found: type WebDriver
[error]     implicit val webDriver: WebDriver = new HtmlUnitDriver
[error]                             ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:37:8:
 Class org.openqa.selenium.WebDriver not found - continuing with a stub.
[error] import org.openqa.selenium.htmlunit.HtmlUnitDriver
[error]        ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:45:
 Class org.openqa.selenium.Capabilities not found - continuing with a stub.
[error]     implicit val webDriver: WebDriver = new HtmlUnitDriver
[error]                                             ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:9:
 Symbol 'type org.openqa.selenium.WebDriver' is missing from the classpath.
[error] This symbol is required by 'value 
org.scalatestplus.selenium.WebBrowser.go.driver'.
[error] Make sure that type WebDriver is in your classpath and check for 
conflicting dependencies with `-Ylog-classpath`.
[error] A full rebuild may help if 'WebBrowser.class' was compiled against an 
incompatible version of org.openqa.selenium.
[error]         go to target.toExternalForm
[error]         ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:12:
 could not find implicit value for parameter driver: 
org.openqa.selenium.WebDriver
[error]         go to target.toExternalForm
[error]            ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:28:
 Unused import
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unused-imports, site=org.apache.spark.deploy.history
[error] import org.openqa.selenium.WebDriver
[error]                            ^
[error] 
/home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:28:
 Unused i

[jira] [Updated] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types

2024-02-16 Thread Holden Karau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau updated SPARK-47001:
-
Description: When pushing a filter down in a union the data type may not 
match exactly if the filter was constructed using the child dataframe 
reference. This is because the unions output is updated with a structype merge 
of union which can turn non-nullable to nullable. These are still the same 
column despite the different nullability so the filter should be safe to push 
down. As it currently stands we get an exception.  (was: Right now it asserts 
exact equality but uses semanticEquality for candidacy, this can result in an 
unexpected exception in Optimizer.scala when pushing down semantically equal 
but different values.)
Summary: Pushdown Verification in Optimizer.scala should support 
changed data types  (was: Pushdown Verification in Optimizer.scala should use 
semantic equals)

> Pushdown Verification in Optimizer.scala should support changed data types
> --
>
> Key: SPARK-47001
> URL: https://issues.apache.org/jira/browse/SPARK-47001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> When pushing a filter down in a union the data type may not match exactly if 
> the filter was constructed using the child dataframe reference. This is 
> because the unions output is updated with a structype merge of union which 
> can turn non-nullable to nullable. These are still the same column despite 
> the different nullability so the filter should be safe to push down. As it 
> currently stands we get an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47076) Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47076.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45143
[https://github.com/apache/spark/pull/45143]

> Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with 
> empty storeDir
> 
>
> Key: SPARK-47076
> URL: https://issues.apache.org/jira/browse/SPARK-47076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This has been observed multiple times.
> {code:java}
> [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
> milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 43 
> times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
> (HistoryServerSuite.scala:564) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47076) Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47076:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Bug)

> Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with 
> empty storeDir
> 
>
> Key: SPARK-47076
> URL: https://issues.apache.org/jira/browse/SPARK-47076
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This has been observed multiple times.
> {code:java}
> [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
> milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 43 
> times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
> (HistoryServerSuite.scala:564) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47076) Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47076:
--
Summary: Fix HistoryServerSuite.`incomplete apps get refreshed` test to 
start with empty storeDir  (was: Flaky test: HistoryServerSuite - incomplete 
apps get refreshed)

> Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with 
> empty storeDir
> 
>
> Key: SPARK-47076
> URL: https://issues.apache.org/jira/browse/SPARK-47076
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This has been observed multiple times.
> {code:java}
> [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
> milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 43 
> times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
> (HistoryServerSuite.scala:564) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47076:
-

Assignee: Dongjoon Hyun

> Flaky test: HistoryServerSuite - incomplete apps get refreshed
> --
>
> Key: SPARK-47076
> URL: https://issues.apache.org/jira/browse/SPARK-47076
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This has been observed multiple times.
> {code:java}
> [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
> milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 43 
> times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
> (HistoryServerSuite.scala:564) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47076:
---
Labels: pull-request-available  (was: )

> Flaky test: HistoryServerSuite - incomplete apps get refreshed
> --
>
> Key: SPARK-47076
> URL: https://issues.apache.org/jira/browse/SPARK-47076
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This has been observed multiple times.
> {code:java}
> [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
> milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 43 
> times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
> (HistoryServerSuite.scala:564) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47075) Add `derby-provided` profile

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47075.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45138
[https://github.com/apache/spark/pull/45138]

> Add `derby-provided` profile
> 
>
> Key: SPARK-47075
> URL: https://issues.apache.org/jira/browse/SPARK-47075
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47074) Fix outdated comments in GitHub Action scripts

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47074:
-

Assignee: Dongjoon Hyun

> Fix outdated comments in GitHub Action scripts
> --
>
> Key: SPARK-47074
> URL: https://issues.apache.org/jira/browse/SPARK-47074
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47074) Fix outdated comments in GitHub Action scripts

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47074.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45137
[https://github.com/apache/spark/pull/45137]

> Fix outdated comments in GitHub Action scripts
> --
>
> Key: SPARK-47074
> URL: https://issues.apache.org/jira/browse/SPARK-47074
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47032) Create API for 'analyze' method to send input column(s) to output table unchanged

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47032:
---
Labels: pull-request-available  (was: )

> Create API for 'analyze' method to send input column(s) to output table 
> unchanged
> -
>
> Key: SPARK-47032
> URL: https://issues.apache.org/jira/browse/SPARK-47032
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47076:
--
Description: 
This has been observed multiple times.
{code:java}
[info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 43 
times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
(HistoryServerSuite.scala:564) {code}

> Flaky test: HistoryServerSuite - incomplete apps get refreshed
> --
>
> Key: SPARK-47076
> URL: https://issues.apache.org/jira/browse/SPARK-47076
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This has been observed multiple times.
> {code:java}
> [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 
> milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 43 
> times over 10.22918722 seconds. Last failure message: 0 did not equal 4. 
> (HistoryServerSuite.scala:564) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed

2024-02-16 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47076:
-

 Summary: Flaky test: HistoryServerSuite - incomplete apps get 
refreshed
 Key: SPARK-47076
 URL: https://issues.apache.org/jira/browse/SPARK-47076
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47070) Subquery rewrite inside an aggregation makes an aggregation invalid

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47070:
---
Labels: pull-request-available  (was: )

> Subquery rewrite inside an aggregation makes an aggregation invalid
> ---
>
> Key: SPARK-47070
> URL: https://issues.apache.org/jira/browse/SPARK-47070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Anton Lykov
>Priority: Major
>  Labels: pull-request-available
>
> When an in/exists-subquery appears inside an aggregate expression within a 
> top-level GROUP BY, it gets rewritten and a new `exists` variable is 
> introduced. However, this variable is incorrectly handled in aggregation. For 
> example, consider the following query:
> ```
> SELECT
> CASE
> WHEN t1.id IN (SELECT id FROM t2) THEN 10
> ELSE -10
> END AS v1
> FROM t1
> GROUP BY t1.id;
> ```
>  
> Executing it leads to the following error:
> ```
> java.lang.IllegalArgumentException: Cannot find column index for attribute 
> 'exists#844' in: Map()
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45357) Maven test `SparkConnectProtoSuite` failed

2024-02-16 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818057#comment-17818057
 ] 

Dongjoon Hyun commented on SPARK-45357:
---

This is backported to branch-3.5 via 
[https://github.com/apache/spark/pull/45141]

> Maven test `SparkConnectProtoSuite` failed
> --
>
> Key: SPARK-45357
> URL: https://issues.apache.org/jira/browse/SPARK-45357
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
>  
> build/mvn clean install -pl connector/connect/server -am -DskipTests
> mvn test -pl connector/connect/server 
>  
> {code:java}
> - Test observe *** FAILED ***
>   == FAIL: Plans do not match ===
>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS max_val#0, 
> sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric, [min(id#0) AS 
> min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], 53
>    +- LocalRelation , [id#0, name#0]                                   
>                               +- LocalRelation , [id#0, name#0] 
> (PlanTest.scala:179) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45357) Maven test `SparkConnectProtoSuite` failed

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45357:
--
Fix Version/s: 3.5.2

> Maven test `SparkConnectProtoSuite` failed
> --
>
> Key: SPARK-45357
> URL: https://issues.apache.org/jira/browse/SPARK-45357
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
>  
> build/mvn clean install -pl connector/connect/server -am -DskipTests
> mvn test -pl connector/connect/server 
>  
> {code:java}
> - Test observe *** FAILED ***
>   == FAIL: Plans do not match ===
>   !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS max_val#0, 
> sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric, [min(id#0) AS 
> min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], 53
>    +- LocalRelation , [id#0, name#0]                                   
>                               +- LocalRelation , [id#0, name#0] 
> (PlanTest.scala:179) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-47019) AQE dynamic cache partitioning causes SortMergeJoin to result in data loss

2024-02-16 Thread Ridvan Appa Bugis (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ridvan Appa Bugis closed SPARK-47019.
-

closing

> AQE dynamic cache partitioning causes SortMergeJoin to result in data loss
> --
>
> Key: SPARK-47019
> URL: https://issues.apache.org/jira/browse/SPARK-47019
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.5.0
> Environment: Tested in 3.5.0
> Reproduced on, so far:
>  * kubernetes deployment
>  * docker cluster deployment
> Local Cluster:
>  * master
>  * worker1 (2/2G)
>  * worker2 (1/1G)
>Reporter: Ridvan Appa Bugis
>Priority: Blocker
>  Labels: DAG, caching, correctness, data-loss, 
> dynamic_allocation, inconsistency, partitioning
> Fix For: 3.5.1
>
> Attachments: Screenshot 2024-02-07 at 20.09.44.png, Screenshot 
> 2024-02-07 at 20.10.07.png, eventLogs-app-20240207175940-0023.zip, 
> testdata.zip
>
>
> It seems like we have encountered an issue with Spark AQE's dynamic cache 
> partitioning which causes incorrect *count* output values and data loss.
> A similar issue could not be found, so i am creating this ticket to raise 
> awareness.
>  
> Preconditions:
>  - Setup a cluster as per environment specification
>  - Prepare test data (or a data large enough to trigger read by both 
> executors)
> Steps to reproduce:
>  - Read parent
>  - Self join parent
>  - cache + materialize parent
>  - Join parent with child
>  
> Performing a self-join over a parentDF, then caching + materialising the DF, 
> and then joining it with a childDF results in *incorrect* count value and 
> {*}missing data{*}.
>  
> Performing a *repartition* seems to fix the issue, most probably due to 
> rearrangement of the underlying partitions and statistic update.
>  
> This behaviour is observed over a multi-worker cluster with a job running 2 
> executors (1 per worker), when reading a large enough data file by both 
> executors.
> Not reproducible in local mode.
>  
> Circumvention:
> So far, by disabling 
> _spark.sql.optimizer.canChangeCachedPlanOutputPartitioning_ or performing 
> repartition this can be alleviated, but it is not the fix of the root cause.
>  
> This issue is dangerous considering that data loss is occurring silently and 
> in absence of proper checks can lead to wrong behaviour/results down the 
> line. So we have labeled it as a blocker.
>  
> There seems to be a file-size treshold after which dataloss is observed 
> (possibly implying that it happens when both executors start reading the data 
> file)
>  
> Minimal example:
> {code:java}
> // Read parent
> val parentData = session.read.format("avro").load("/data/shared/test/parent")
> // Self join parent and cache + materialize
> val parent = parentData.join(parentData, Seq("PID")).cache()
> parent.count()
> // Read child
> val child = session.read.format("avro").load("/data/shared/test/child")
> // Basic join
> val resultBasic = child.join(
>   parent,
>   parent("PID") === child("PARENT_ID")
> )
> // Count: 16479 (Wrong)
> println(s"Count no repartition: ${resultBasic.count()}")
> // Repartition parent join
> val resultRepartition = child.join(
>   parent.repartition(),
>   parent("PID") === child("PARENT_ID")
> )
> // Count: 50094 (Correct)
> println(s"Count with repartition: ${resultRepartition.count()}") {code}
>  
> Invalid count-only DAG:
>   !Screenshot 2024-02-07 at 20.10.07.png|width=519,height=853!
> Valid repartition DAG:
> !Screenshot 2024-02-07 at 20.09.44.png|width=368,height=1219!  
>  
> Spark submit for this job:
> {code:java}
> spark-submit 
>   --class ExampleApp 
>   --packages org.apache.spark:spark-avro_2.12:3.5.0 
>   --deploy-mode cluster 
>   --master spark://spark-master:6066 
>   --conf spark.sql.autoBroadcastJoinThreshold=-1  
>   --conf spark.cores.max=3 
>   --driver-cores 1 
>   --driver-memory 1g 
>   --executor-cores 1 
>   --executor-memory 1g 
>   /path/to/test.jar
>  {code}
> The cluster should be setup to the following (worker1(m+e) worker2(e)) as to 
> split the executors onto two workers.
> I have prepared a simple github repository which contains the compilable 
> above example.
> [https://github.com/ridvanappabugis/spark-3.5-issue]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47073) Upgrade several Maven plugins to the latest versions

2024-02-16 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-47073.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45136
[https://github.com/apache/spark/pull/45136]

> Upgrade several Maven plugins to the latest versions
> 
>
> Key: SPARK-47073
> URL: https://issues.apache.org/jira/browse/SPARK-47073
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> * {{versions-maven-plugin}} from 2.16.0 to 2.16.2.
>  * {{maven-enforcer-plugin}} from 3.3.0 to 3.4.1.
>  * {{maven-compiler-plugin}} from 3.11.0 to 3.12.1.
>  * {{maven-surefire-plugin}} from 3.1.2 to 3.2.5.
>  * {{maven-clean-plugin}} from 3.3.1 to 3.3.2.
>  * {{maven-javadoc-plugin}} from 3.5.0 to 3.6.3.
>  * {{maven-shade-plugin}} from 3.5.0 to 3.5.1.
>  * {{maven-dependency-plugin}} from 3.6.0 to 3.6.1.
>  * {{maven-checkstyle-plugin}} from 3.3.0 to 3.3.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44027) create permanent Spark View from DataFrame via PySpark & Scala DataFrame API

2024-02-16 Thread Ahmed Sobeh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817946#comment-17817946
 ] 

Ahmed Sobeh commented on SPARK-44027:
-

is it ok if I pick this up? Is it actually newbie level?

> create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API
> --
>
> Key: SPARK-44027
> URL: https://issues.apache.org/jira/browse/SPARK-44027
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Martin Bode
>Priority: Major
>  Labels: features, newbie
>
> currently only *_temporary_ Spark Views* can be created from a DataFrame:
>  * 
> [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView]
>  * 
> [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView]
>  * 
> [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView]
>  * 
> [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView]
> When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark 
> SQL ({{{}CREATE VIEW AS SELECT...{}}}).
> Sometimes it is easier and more readable to specify the desired logic of the 
> view through {_}Scala/PySpark DataFrame API{_}.
> Therefore, I'd like to suggest to implement a new PySpark method that allows 
> creating a _*permanent*_ *Spark View* from a DataFrame (e.g. 
> {{{}DataFrame.createOrReplaceView{}}}).
> see also:
>  * 
> [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark]
>  * [https://lists.apache.org/thread/jzkznvt7cfjhmo77w1tlksxkwyvmvvfb]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47072) Wrong error message for incorrect ANSI intervals

2024-02-16 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-47072.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45127
[https://github.com/apache/spark/pull/45127]

> Wrong error message for incorrect ANSI intervals
> 
>
> Key: SPARK-47072
> URL: https://issues.apache.org/jira/browse/SPARK-47072
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When Spark SQL cannot recognise ANSI interval, it outputs wrong pattern for 
> particular ANSI interval. For example, it cannot recognise year-month 
> interval, but says about day-time interval:
> {code:sql}
> spark-sql (default)> select interval '-\t2-2\t' year to month;
> Interval string does not match year-month format of `[+|-]d h`, `INTERVAL 
> [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: - 2-2 . 
> (line 1, pos 16)
> == SQL ==
> select interval '-\t2-2\t' year to month
> ^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47075) Add `derby-provided` profile

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47075:
-

Assignee: Dongjoon Hyun

> Add `derby-provided` profile
> 
>
> Key: SPARK-47075
> URL: https://issues.apache.org/jira/browse/SPARK-47075
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47075) Add `derby-provided` profile

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47075:
---
Labels: pull-request-available  (was: )

> Add `derby-provided` profile
> 
>
> Key: SPARK-47075
> URL: https://issues.apache.org/jira/browse/SPARK-47075
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47075) Add `derby-provided` profile

2024-02-16 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47075:
-

 Summary: Add `derby-provided` profile
 Key: SPARK-47075
 URL: https://issues.apache.org/jira/browse/SPARK-47075
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47074) Fix outdated comments in GitHub Action scripts

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47074:
--
Summary: Fix outdated comments in GitHub Action scripts  (was: Update 
comments in GitHub Action scripts)

> Fix outdated comments in GitHub Action scripts
> --
>
> Key: SPARK-47074
> URL: https://issues.apache.org/jira/browse/SPARK-47074
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47074) Fix outdated comments in GitHub Action scripts

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47074:
---
Labels: pull-request-available  (was: )

> Fix outdated comments in GitHub Action scripts
> --
>
> Key: SPARK-47074
> URL: https://issues.apache.org/jira/browse/SPARK-47074
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47074) Update comments in GitHub Action scripts

2024-02-16 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47074:
-

 Summary: Update comments in GitHub Action scripts
 Key: SPARK-47074
 URL: https://issues.apache.org/jira/browse/SPARK-47074
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47072) Wrong error message for incorrect ANSI intervals

2024-02-16 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-47072:
-
Affects Version/s: 3.5.0

> Wrong error message for incorrect ANSI intervals
> 
>
> Key: SPARK-47072
> URL: https://issues.apache.org/jira/browse/SPARK-47072
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>  Labels: pull-request-available
>
> When Spark SQL cannot recognise ANSI interval, it outputs wrong pattern for 
> particular ANSI interval. For example, it cannot recognise year-month 
> interval, but says about day-time interval:
> {code:sql}
> spark-sql (default)> select interval '-\t2-2\t' year to month;
> Interval string does not match year-month format of `[+|-]d h`, `INTERVAL 
> [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: - 2-2 . 
> (line 1, pos 16)
> == SQL ==
> select interval '-\t2-2\t' year to month
> ^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47072) Wrong error message for incorrect ANSI intervals

2024-02-16 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-47072:
-
Affects Version/s: 3.4.2

> Wrong error message for incorrect ANSI intervals
> 
>
> Key: SPARK-47072
> URL: https://issues.apache.org/jira/browse/SPARK-47072
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>  Labels: pull-request-available
>
> When Spark SQL cannot recognise ANSI interval, it outputs wrong pattern for 
> particular ANSI interval. For example, it cannot recognise year-month 
> interval, but says about day-time interval:
> {code:sql}
> spark-sql (default)> select interval '-\t2-2\t' year to month;
> Interval string does not match year-month format of `[+|-]d h`, `INTERVAL 
> [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: - 2-2 . 
> (line 1, pos 16)
> == SQL ==
> select interval '-\t2-2\t' year to month
> ^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47060) Check SparkIllegalArgumentException instead of IllegalArgumentException in catalyst

2024-02-16 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-47060.
--
Resolution: Fixed

Issue resolved by pull request 45118
[https://github.com/apache/spark/pull/45118]

> Check SparkIllegalArgumentException instead of IllegalArgumentException in 
> catalyst
> ---
>
> Key: SPARK-47060
> URL: https://issues.apache.org/jira/browse/SPARK-47060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Use checkError() to test the SparkIllegalArgumentException exception instead 
> of IllegalArgumentException in the Catalyst project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47073) Upgrade `versions-maven-plugin` to 2.16.2

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47073:
---
Labels: pull-request-available  (was: )

> Upgrade `versions-maven-plugin` to 2.16.2
> -
>
> Key: SPARK-47073
> URL: https://issues.apache.org/jira/browse/SPARK-47073
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47071) inline With expression if it contains special expression

2024-02-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47071.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45134
[https://github.com/apache/spark/pull/45134]

> inline With expression if it contains special expression
> 
>
> Key: SPARK-47071
> URL: https://issues.apache.org/jira/browse/SPARK-47071
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47057) Reenable MyPy data test

2024-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47057:
---
Labels: pull-request-available  (was: )

> Reenable MyPy data test
> ---
>
> Key: SPARK-47057
> URL: https://issues.apache.org/jira/browse/SPARK-47057
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

43 matches

Mail list logo