[jira] [Created] (SPARK-47080) Fix `HistoryServerSuite` to get `getNumJobs` in `eventually`
Dongjoon Hyun created SPARK-47080: - Summary: Fix `HistoryServerSuite` to get `getNumJobs` in `eventually` Key: SPARK-47080 URL: https://issues.apache.org/jira/browse/SPARK-47080 Project: Spark Issue Type: Sub-task Components: Spark Core, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47057) Reenable MyPy data test
[ https://issues.apache.org/jira/browse/SPARK-47057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47057. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45135 [https://github.com/apache/spark/pull/45135] > Reenable MyPy data test > --- > > Key: SPARK-47057 > URL: https://issues.apache.org/jira/browse/SPARK-47057 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47057) Reenable MyPy data test
[ https://issues.apache.org/jira/browse/SPARK-47057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47057: Assignee: Hyukjin Kwon > Reenable MyPy data test > --- > > Key: SPARK-47057 > URL: https://issues.apache.org/jira/browse/SPARK-47057 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Desmond Cheong updated SPARK-47079: --- Description: Trying to create a dataframe containing a variant type results in: AssertionError: Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 'variant'} "} was:Trying to create a dataframe containing a variant type results in `AssertionError: Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 'variant'}"}`. > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Priority: Major > > Trying to create a dataframe containing a variant type results in: > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: > {'error': 'variant'} > "} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Desmond Cheong updated SPARK-47079: --- Description: Trying to create a dataframe containing a variant type results in {{{}AssertionError: Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 'variant'}"}{}}}. > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Priority: Major > > Trying to create a dataframe containing a variant type results in > {{{}AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': > 'variant'}"}{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Desmond Cheong updated SPARK-47079: --- Description: Trying to create a dataframe containing a variant type results in `AssertionError: Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 'variant'}"}`. (was: Trying to create a dataframe containing a variant type results in {{{}AssertionError: Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': 'variant'}"}{}}}.) > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Priority: Major > > Trying to create a dataframe containing a variant type results in > `AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: \{'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': > 'variant'}"}`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
Desmond Cheong created SPARK-47079: -- Summary: Unable to create PySpark dataframe containing Variant columns Key: SPARK-47079 URL: https://issues.apache.org/jira/browse/SPARK-47079 Project: Spark Issue Type: Bug Components: Connect, PySpark, SQL Affects Versions: 3.5.0 Reporter: Desmond Cheong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47077) sbt build is broken due to selenium change
[ https://issues.apache.org/jira/browse/SPARK-47077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-47077. -- Resolution: Cannot Reproduce After blowing away my maven + ivy cache it works fine – should have done that earlier. > sbt build is broken due to selenium change > -- > > Key: SPARK-47077 > URL: https://issues.apache.org/jira/browse/SPARK-47077 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Labels: pull-request-available > > Building with sbt & JDK11 or 17 (executed after reload & clean > ;compile;catalyst/testOnly > org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite) results in > > {code:java} > > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:8: > object WebDriver is not a member of package org.openqa.selenium > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:33:27: > not found: type WebDriver > [error] override var webDriver: WebDriver = _ > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:37:29: > Class org.openqa.selenium.remote.AbstractDriverOptions not found - > continuing with a stub. > [error] val chromeOptions = new ChromeOptions > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:8: > object WebDriver is not a member of package org.openqa.selenium > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:43:27: > not found: type WebDriver > [error] implicit var webDriver: WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:39:21: > Class org.openqa.selenium.remote.RemoteWebDriver not found - continuing with > a stub. > [error] webDriver = new ChromeDriver(chromeOptions) > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:28: > Unused import > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unused-imports, site=org.apache.spark.deploy.history > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:8: > object WebDriver is not a member of package org.openqa.selenium > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:29: > not found: type WebDriver > [error] implicit val webDriver: WebDriver = new HtmlUnitDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:37:8: > Class org.openqa.selenium.WebDriver not found - continuing with a stub. > [error] import org.openqa.selenium.htmlunit.HtmlUnitDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:45: > Class org.openqa.selenium.Capabilities not found - continuing with a stub. > [error] implicit val webDriver: WebDriver = new HtmlUnitDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:9: > Symbol 'type org.openqa.selenium.WebDriver' is missing from the classpath. > [error] This symbol is required by 'value > org.scalatestplus.selenium.WebBrowser.go.driver'. > [error] Make sure that type WebDriver is in your classpath and check for > conflicting dependencies with `-Ylog-classpath`. > [error] A full rebuild may help if 'WebBrowser.class' was compiled against an > incompatible version of org.openqa.selenium. > [error] go to target.toExternalForm > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:12: > could not find implicit value for parameter driver: > org.openqa.selenium.WebDriver > [error]
[jira] [Updated] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet
[ https://issues.apache.org/jira/browse/SPARK-42285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42285: --- Labels: pull-request-available (was: ) > Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ > inference on Parquet > > > Key: SPARK-42285 > URL: https://issues.apache.org/jira/browse/SPARK-42285 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ > inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled > which makes it impossible for TimestampNTZ writing when the flag is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47078) Documentation for SparkSession-based Profilers
Xinrong Meng created SPARK-47078: Summary: Documentation for SparkSession-based Profilers Key: SPARK-47078 URL: https://issues.apache.org/jira/browse/SPARK-47078 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47077) sbt build is broken due to selenium change
[ https://issues.apache.org/jira/browse/SPARK-47077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47077: --- Labels: pull-request-available (was: ) > sbt build is broken due to selenium change > -- > > Key: SPARK-47077 > URL: https://issues.apache.org/jira/browse/SPARK-47077 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Labels: pull-request-available > > Building with sbt & JDK11 or 17 (executed after reload & clean > ;compile;catalyst/testOnly > org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite) results in > > {code:java} > > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:8: > object WebDriver is not a member of package org.openqa.selenium > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:33:27: > not found: type WebDriver > [error] override var webDriver: WebDriver = _ > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:37:29: > Class org.openqa.selenium.remote.AbstractDriverOptions not found - > continuing with a stub. > [error] val chromeOptions = new ChromeOptions > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:8: > object WebDriver is not a member of package org.openqa.selenium > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:43:27: > not found: type WebDriver > [error] implicit var webDriver: WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:39:21: > Class org.openqa.selenium.remote.RemoteWebDriver not found - continuing with > a stub. > [error] webDriver = new ChromeDriver(chromeOptions) > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:28: > Unused import > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unused-imports, site=org.apache.spark.deploy.history > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:8: > object WebDriver is not a member of package org.openqa.selenium > [error] import org.openqa.selenium.WebDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:29: > not found: type WebDriver > [error] implicit val webDriver: WebDriver = new HtmlUnitDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:37:8: > Class org.openqa.selenium.WebDriver not found - continuing with a stub. > [error] import org.openqa.selenium.htmlunit.HtmlUnitDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:45: > Class org.openqa.selenium.Capabilities not found - continuing with a stub. > [error] implicit val webDriver: WebDriver = new HtmlUnitDriver > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:9: > Symbol 'type org.openqa.selenium.WebDriver' is missing from the classpath. > [error] This symbol is required by 'value > org.scalatestplus.selenium.WebBrowser.go.driver'. > [error] Make sure that type WebDriver is in your classpath and check for > conflicting dependencies with `-Ylog-classpath`. > [error] A full rebuild may help if 'WebBrowser.class' was compiled against an > incompatible version of org.openqa.selenium. > [error] go to target.toExternalForm > [error] ^ > [error] > /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:12: > could not find implicit value for parameter driver: > org.openqa.selenium.WebDriver > [error] go to target.toExternalForm > [error] ^ > [error] > /home/holde
[jira] [Created] (SPARK-47077) sbt build is broken due to selenium change
Holden Karau created SPARK-47077: Summary: sbt build is broken due to selenium change Key: SPARK-47077 URL: https://issues.apache.org/jira/browse/SPARK-47077 Project: Spark Issue Type: Improvement Components: Build, Tests Affects Versions: 4.0.0, 3.5.2 Reporter: Holden Karau Assignee: Holden Karau Building with sbt & JDK11 or 17 (executed after reload & clean ;compile;catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite) results in {code:java} [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:8: object WebDriver is not a member of package org.openqa.selenium [error] import org.openqa.selenium.WebDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:33:27: not found: type WebDriver [error] override var webDriver: WebDriver = _ [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:37:29: Class org.openqa.selenium.remote.AbstractDriverOptions not found - continuing with a stub. [error] val chromeOptions = new ChromeOptions [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:8: object WebDriver is not a member of package org.openqa.selenium [error] import org.openqa.selenium.WebDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:43:27: not found: type WebDriver [error] implicit var webDriver: WebDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:39:21: Class org.openqa.selenium.remote.RemoteWebDriver not found - continuing with a stub. [error] webDriver = new ChromeDriver(chromeOptions) [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/ChromeUIHistoryServerSuite.scala:20:28: Unused import [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unused-imports, site=org.apache.spark.deploy.history [error] import org.openqa.selenium.WebDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:8: object WebDriver is not a member of package org.openqa.selenium [error] import org.openqa.selenium.WebDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:29: not found: type WebDriver [error] implicit val webDriver: WebDriver = new HtmlUnitDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:37:8: Class org.openqa.selenium.WebDriver not found - continuing with a stub. [error] import org.openqa.selenium.htmlunit.HtmlUnitDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:383:45: Class org.openqa.selenium.Capabilities not found - continuing with a stub. [error] implicit val webDriver: WebDriver = new HtmlUnitDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:9: Symbol 'type org.openqa.selenium.WebDriver' is missing from the classpath. [error] This symbol is required by 'value org.scalatestplus.selenium.WebBrowser.go.driver'. [error] Make sure that type WebDriver is in your classpath and check for conflicting dependencies with `-Ylog-classpath`. [error] A full rebuild may help if 'WebBrowser.class' was compiled against an incompatible version of org.openqa.selenium. [error] go to target.toExternalForm [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:470:12: could not find implicit value for parameter driver: org.openqa.selenium.WebDriver [error] go to target.toExternalForm [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:36:28: Unused import [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unused-imports, site=org.apache.spark.deploy.history [error] import org.openqa.selenium.WebDriver [error] ^ [error] /home/holden/repos/spark/core/src/test/scala/org/apache/spark/deploy/history/RealBrowserUIHistoryServerSuite.scala:24:28: Unused i
[jira] [Updated] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types
[ https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-47001: - Description: When pushing a filter down in a union the data type may not match exactly if the filter was constructed using the child dataframe reference. This is because the unions output is updated with a structype merge of union which can turn non-nullable to nullable. These are still the same column despite the different nullability so the filter should be safe to push down. As it currently stands we get an exception. (was: Right now it asserts exact equality but uses semanticEquality for candidacy, this can result in an unexpected exception in Optimizer.scala when pushing down semantically equal but different values.) Summary: Pushdown Verification in Optimizer.scala should support changed data types (was: Pushdown Verification in Optimizer.scala should use semantic equals) > Pushdown Verification in Optimizer.scala should support changed data types > -- > > Key: SPARK-47001 > URL: https://issues.apache.org/jira/browse/SPARK-47001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > When pushing a filter down in a union the data type may not match exactly if > the filter was constructed using the child dataframe reference. This is > because the unions output is updated with a structype merge of union which > can turn non-nullable to nullable. These are still the same column despite > the different nullability so the filter should be safe to push down. As it > currently stands we get an exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47076) Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir
[ https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47076. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45143 [https://github.com/apache/spark/pull/45143] > Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with > empty storeDir > > > Key: SPARK-47076 > URL: https://issues.apache.org/jira/browse/SPARK-47076 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This has been observed multiple times. > {code:java} > [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 > milliseconds) > [info] The code passed to eventually never returned normally. Attempted 43 > times over 10.22918722 seconds. Last failure message: 0 did not equal 4. > (HistoryServerSuite.scala:564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47076) Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir
[ https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47076: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Bug) > Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with > empty storeDir > > > Key: SPARK-47076 > URL: https://issues.apache.org/jira/browse/SPARK-47076 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This has been observed multiple times. > {code:java} > [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 > milliseconds) > [info] The code passed to eventually never returned normally. Attempted 43 > times over 10.22918722 seconds. Last failure message: 0 did not equal 4. > (HistoryServerSuite.scala:564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47076) Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir
[ https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47076: -- Summary: Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with empty storeDir (was: Flaky test: HistoryServerSuite - incomplete apps get refreshed) > Fix HistoryServerSuite.`incomplete apps get refreshed` test to start with > empty storeDir > > > Key: SPARK-47076 > URL: https://issues.apache.org/jira/browse/SPARK-47076 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This has been observed multiple times. > {code:java} > [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 > milliseconds) > [info] The code passed to eventually never returned normally. Attempted 43 > times over 10.22918722 seconds. Last failure message: 0 did not equal 4. > (HistoryServerSuite.scala:564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed
[ https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47076: - Assignee: Dongjoon Hyun > Flaky test: HistoryServerSuite - incomplete apps get refreshed > -- > > Key: SPARK-47076 > URL: https://issues.apache.org/jira/browse/SPARK-47076 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This has been observed multiple times. > {code:java} > [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 > milliseconds) > [info] The code passed to eventually never returned normally. Attempted 43 > times over 10.22918722 seconds. Last failure message: 0 did not equal 4. > (HistoryServerSuite.scala:564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed
[ https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47076: --- Labels: pull-request-available (was: ) > Flaky test: HistoryServerSuite - incomplete apps get refreshed > -- > > Key: SPARK-47076 > URL: https://issues.apache.org/jira/browse/SPARK-47076 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This has been observed multiple times. > {code:java} > [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 > milliseconds) > [info] The code passed to eventually never returned normally. Attempted 43 > times over 10.22918722 seconds. Last failure message: 0 did not equal 4. > (HistoryServerSuite.scala:564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47075) Add `derby-provided` profile
[ https://issues.apache.org/jira/browse/SPARK-47075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47075. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45138 [https://github.com/apache/spark/pull/45138] > Add `derby-provided` profile > > > Key: SPARK-47075 > URL: https://issues.apache.org/jira/browse/SPARK-47075 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47074) Fix outdated comments in GitHub Action scripts
[ https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47074: - Assignee: Dongjoon Hyun > Fix outdated comments in GitHub Action scripts > -- > > Key: SPARK-47074 > URL: https://issues.apache.org/jira/browse/SPARK-47074 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47074) Fix outdated comments in GitHub Action scripts
[ https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47074. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45137 [https://github.com/apache/spark/pull/45137] > Fix outdated comments in GitHub Action scripts > -- > > Key: SPARK-47074 > URL: https://issues.apache.org/jira/browse/SPARK-47074 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47032) Create API for 'analyze' method to send input column(s) to output table unchanged
[ https://issues.apache.org/jira/browse/SPARK-47032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47032: --- Labels: pull-request-available (was: ) > Create API for 'analyze' method to send input column(s) to output table > unchanged > - > > Key: SPARK-47032 > URL: https://issues.apache.org/jira/browse/SPARK-47032 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed
[ https://issues.apache.org/jira/browse/SPARK-47076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47076: -- Description: This has been observed multiple times. {code:java} [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 milliseconds) [info] The code passed to eventually never returned normally. Attempted 43 times over 10.22918722 seconds. Last failure message: 0 did not equal 4. (HistoryServerSuite.scala:564) {code} > Flaky test: HistoryServerSuite - incomplete apps get refreshed > -- > > Key: SPARK-47076 > URL: https://issues.apache.org/jira/browse/SPARK-47076 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This has been observed multiple times. > {code:java} > [info] - incomplete apps get refreshed *** FAILED *** (15 seconds, 450 > milliseconds) > [info] The code passed to eventually never returned normally. Attempted 43 > times over 10.22918722 seconds. Last failure message: 0 did not equal 4. > (HistoryServerSuite.scala:564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47076) Flaky test: HistoryServerSuite - incomplete apps get refreshed
Dongjoon Hyun created SPARK-47076: - Summary: Flaky test: HistoryServerSuite - incomplete apps get refreshed Key: SPARK-47076 URL: https://issues.apache.org/jira/browse/SPARK-47076 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47070) Subquery rewrite inside an aggregation makes an aggregation invalid
[ https://issues.apache.org/jira/browse/SPARK-47070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47070: --- Labels: pull-request-available (was: ) > Subquery rewrite inside an aggregation makes an aggregation invalid > --- > > Key: SPARK-47070 > URL: https://issues.apache.org/jira/browse/SPARK-47070 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Anton Lykov >Priority: Major > Labels: pull-request-available > > When an in/exists-subquery appears inside an aggregate expression within a > top-level GROUP BY, it gets rewritten and a new `exists` variable is > introduced. However, this variable is incorrectly handled in aggregation. For > example, consider the following query: > ``` > SELECT > CASE > WHEN t1.id IN (SELECT id FROM t2) THEN 10 > ELSE -10 > END AS v1 > FROM t1 > GROUP BY t1.id; > ``` > > Executing it leads to the following error: > ``` > java.lang.IllegalArgumentException: Cannot find column index for attribute > 'exists#844' in: Map() > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45357) Maven test `SparkConnectProtoSuite` failed
[ https://issues.apache.org/jira/browse/SPARK-45357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818057#comment-17818057 ] Dongjoon Hyun commented on SPARK-45357: --- This is backported to branch-3.5 via [https://github.com/apache/spark/pull/45141] > Maven test `SparkConnectProtoSuite` failed > -- > > Key: SPARK-45357 > URL: https://issues.apache.org/jira/browse/SPARK-45357 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > > build/mvn clean install -pl connector/connect/server -am -DskipTests > mvn test -pl connector/connect/server > > {code:java} > - Test observe *** FAILED *** > == FAIL: Plans do not match === > !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS max_val#0, > sum(id#0) AS sum(id)#0L], 0 CollectMetrics my_metric, [min(id#0) AS > min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], 53 > +- LocalRelation , [id#0, name#0] > +- LocalRelation , [id#0, name#0] > (PlanTest.scala:179) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45357) Maven test `SparkConnectProtoSuite` failed
[ https://issues.apache.org/jira/browse/SPARK-45357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45357: -- Fix Version/s: 3.5.2 > Maven test `SparkConnectProtoSuite` failed > -- > > Key: SPARK-45357 > URL: https://issues.apache.org/jira/browse/SPARK-45357 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > > build/mvn clean install -pl connector/connect/server -am -DskipTests > mvn test -pl connector/connect/server > > {code:java} > - Test observe *** FAILED *** > == FAIL: Plans do not match === > !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS max_val#0, > sum(id#0) AS sum(id)#0L], 0 CollectMetrics my_metric, [min(id#0) AS > min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], 53 > +- LocalRelation , [id#0, name#0] > +- LocalRelation , [id#0, name#0] > (PlanTest.scala:179) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-47019) AQE dynamic cache partitioning causes SortMergeJoin to result in data loss
[ https://issues.apache.org/jira/browse/SPARK-47019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ridvan Appa Bugis closed SPARK-47019. - closing > AQE dynamic cache partitioning causes SortMergeJoin to result in data loss > -- > > Key: SPARK-47019 > URL: https://issues.apache.org/jira/browse/SPARK-47019 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 3.5.0 > Environment: Tested in 3.5.0 > Reproduced on, so far: > * kubernetes deployment > * docker cluster deployment > Local Cluster: > * master > * worker1 (2/2G) > * worker2 (1/1G) >Reporter: Ridvan Appa Bugis >Priority: Blocker > Labels: DAG, caching, correctness, data-loss, > dynamic_allocation, inconsistency, partitioning > Fix For: 3.5.1 > > Attachments: Screenshot 2024-02-07 at 20.09.44.png, Screenshot > 2024-02-07 at 20.10.07.png, eventLogs-app-20240207175940-0023.zip, > testdata.zip > > > It seems like we have encountered an issue with Spark AQE's dynamic cache > partitioning which causes incorrect *count* output values and data loss. > A similar issue could not be found, so i am creating this ticket to raise > awareness. > > Preconditions: > - Setup a cluster as per environment specification > - Prepare test data (or a data large enough to trigger read by both > executors) > Steps to reproduce: > - Read parent > - Self join parent > - cache + materialize parent > - Join parent with child > > Performing a self-join over a parentDF, then caching + materialising the DF, > and then joining it with a childDF results in *incorrect* count value and > {*}missing data{*}. > > Performing a *repartition* seems to fix the issue, most probably due to > rearrangement of the underlying partitions and statistic update. > > This behaviour is observed over a multi-worker cluster with a job running 2 > executors (1 per worker), when reading a large enough data file by both > executors. > Not reproducible in local mode. > > Circumvention: > So far, by disabling > _spark.sql.optimizer.canChangeCachedPlanOutputPartitioning_ or performing > repartition this can be alleviated, but it is not the fix of the root cause. > > This issue is dangerous considering that data loss is occurring silently and > in absence of proper checks can lead to wrong behaviour/results down the > line. So we have labeled it as a blocker. > > There seems to be a file-size treshold after which dataloss is observed > (possibly implying that it happens when both executors start reading the data > file) > > Minimal example: > {code:java} > // Read parent > val parentData = session.read.format("avro").load("/data/shared/test/parent") > // Self join parent and cache + materialize > val parent = parentData.join(parentData, Seq("PID")).cache() > parent.count() > // Read child > val child = session.read.format("avro").load("/data/shared/test/child") > // Basic join > val resultBasic = child.join( > parent, > parent("PID") === child("PARENT_ID") > ) > // Count: 16479 (Wrong) > println(s"Count no repartition: ${resultBasic.count()}") > // Repartition parent join > val resultRepartition = child.join( > parent.repartition(), > parent("PID") === child("PARENT_ID") > ) > // Count: 50094 (Correct) > println(s"Count with repartition: ${resultRepartition.count()}") {code} > > Invalid count-only DAG: > !Screenshot 2024-02-07 at 20.10.07.png|width=519,height=853! > Valid repartition DAG: > !Screenshot 2024-02-07 at 20.09.44.png|width=368,height=1219! > > Spark submit for this job: > {code:java} > spark-submit > --class ExampleApp > --packages org.apache.spark:spark-avro_2.12:3.5.0 > --deploy-mode cluster > --master spark://spark-master:6066 > --conf spark.sql.autoBroadcastJoinThreshold=-1 > --conf spark.cores.max=3 > --driver-cores 1 > --driver-memory 1g > --executor-cores 1 > --executor-memory 1g > /path/to/test.jar > {code} > The cluster should be setup to the following (worker1(m+e) worker2(e)) as to > split the executors onto two workers. > I have prepared a simple github repository which contains the compilable > above example. > [https://github.com/ridvanappabugis/spark-3.5-issue] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47073) Upgrade several Maven plugins to the latest versions
[ https://issues.apache.org/jira/browse/SPARK-47073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-47073. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45136 [https://github.com/apache/spark/pull/45136] > Upgrade several Maven plugins to the latest versions > > > Key: SPARK-47073 > URL: https://issues.apache.org/jira/browse/SPARK-47073 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > * {{versions-maven-plugin}} from 2.16.0 to 2.16.2. > * {{maven-enforcer-plugin}} from 3.3.0 to 3.4.1. > * {{maven-compiler-plugin}} from 3.11.0 to 3.12.1. > * {{maven-surefire-plugin}} from 3.1.2 to 3.2.5. > * {{maven-clean-plugin}} from 3.3.1 to 3.3.2. > * {{maven-javadoc-plugin}} from 3.5.0 to 3.6.3. > * {{maven-shade-plugin}} from 3.5.0 to 3.5.1. > * {{maven-dependency-plugin}} from 3.6.0 to 3.6.1. > * {{maven-checkstyle-plugin}} from 3.3.0 to 3.3.1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44027) create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-44027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817946#comment-17817946 ] Ahmed Sobeh commented on SPARK-44027: - is it ok if I pick this up? Is it actually newbie level? > create *permanent* Spark View from DataFrame via PySpark & Scala DataFrame API > -- > > Key: SPARK-44027 > URL: https://issues.apache.org/jira/browse/SPARK-44027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Martin Bode >Priority: Major > Labels: features, newbie > > currently only *_temporary_ Spark Views* can be created from a DataFrame: > * > [DataFrame.createTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView] > * > [DataFrame.createOrReplaceTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceTempView.html#pyspark.sql.DataFrame.createOrReplaceTempView] > * > [DataFrame.createGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createGlobalTempView.html#pyspark.sql.DataFrame.createGlobalTempView] > * > [DataFrame.createOrReplaceGlobalTempView|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html#pyspark.sql.DataFrame.createOrReplaceGlobalTempView] > When a user needs a _*permanent*_ *Spark View* he has to fall back to Spark > SQL ({{{}CREATE VIEW AS SELECT...{}}}). > Sometimes it is easier and more readable to specify the desired logic of the > view through {_}Scala/PySpark DataFrame API{_}. > Therefore, I'd like to suggest to implement a new PySpark method that allows > creating a _*permanent*_ *Spark View* from a DataFrame (e.g. > {{{}DataFrame.createOrReplaceView{}}}). > see also: > * > [https://community.databricks.com/s/question/0D53f1PANVgCAP/is-there-a-way-to-create-a-nontemporary-spark-view-with-pyspark] > * [https://lists.apache.org/thread/jzkznvt7cfjhmo77w1tlksxkwyvmvvfb] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47072) Wrong error message for incorrect ANSI intervals
[ https://issues.apache.org/jira/browse/SPARK-47072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-47072. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45127 [https://github.com/apache/spark/pull/45127] > Wrong error message for incorrect ANSI intervals > > > Key: SPARK-47072 > URL: https://issues.apache.org/jira/browse/SPARK-47072 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > When Spark SQL cannot recognise ANSI interval, it outputs wrong pattern for > particular ANSI interval. For example, it cannot recognise year-month > interval, but says about day-time interval: > {code:sql} > spark-sql (default)> select interval '-\t2-2\t' year to month; > Interval string does not match year-month format of `[+|-]d h`, `INTERVAL > [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: - 2-2 . > (line 1, pos 16) > == SQL == > select interval '-\t2-2\t' year to month > ^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47075) Add `derby-provided` profile
[ https://issues.apache.org/jira/browse/SPARK-47075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47075: - Assignee: Dongjoon Hyun > Add `derby-provided` profile > > > Key: SPARK-47075 > URL: https://issues.apache.org/jira/browse/SPARK-47075 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47075) Add `derby-provided` profile
[ https://issues.apache.org/jira/browse/SPARK-47075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47075: --- Labels: pull-request-available (was: ) > Add `derby-provided` profile > > > Key: SPARK-47075 > URL: https://issues.apache.org/jira/browse/SPARK-47075 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47075) Add `derby-provided` profile
Dongjoon Hyun created SPARK-47075: - Summary: Add `derby-provided` profile Key: SPARK-47075 URL: https://issues.apache.org/jira/browse/SPARK-47075 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47074) Fix outdated comments in GitHub Action scripts
[ https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47074: -- Summary: Fix outdated comments in GitHub Action scripts (was: Update comments in GitHub Action scripts) > Fix outdated comments in GitHub Action scripts > -- > > Key: SPARK-47074 > URL: https://issues.apache.org/jira/browse/SPARK-47074 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47074) Fix outdated comments in GitHub Action scripts
[ https://issues.apache.org/jira/browse/SPARK-47074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47074: --- Labels: pull-request-available (was: ) > Fix outdated comments in GitHub Action scripts > -- > > Key: SPARK-47074 > URL: https://issues.apache.org/jira/browse/SPARK-47074 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47074) Update comments in GitHub Action scripts
Dongjoon Hyun created SPARK-47074: - Summary: Update comments in GitHub Action scripts Key: SPARK-47074 URL: https://issues.apache.org/jira/browse/SPARK-47074 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47072) Wrong error message for incorrect ANSI intervals
[ https://issues.apache.org/jira/browse/SPARK-47072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-47072: - Affects Version/s: 3.5.0 > Wrong error message for incorrect ANSI intervals > > > Key: SPARK-47072 > URL: https://issues.apache.org/jira/browse/SPARK-47072 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Minor > Labels: pull-request-available > > When Spark SQL cannot recognise ANSI interval, it outputs wrong pattern for > particular ANSI interval. For example, it cannot recognise year-month > interval, but says about day-time interval: > {code:sql} > spark-sql (default)> select interval '-\t2-2\t' year to month; > Interval string does not match year-month format of `[+|-]d h`, `INTERVAL > [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: - 2-2 . > (line 1, pos 16) > == SQL == > select interval '-\t2-2\t' year to month > ^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47072) Wrong error message for incorrect ANSI intervals
[ https://issues.apache.org/jira/browse/SPARK-47072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-47072: - Affects Version/s: 3.4.2 > Wrong error message for incorrect ANSI intervals > > > Key: SPARK-47072 > URL: https://issues.apache.org/jira/browse/SPARK-47072 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Minor > Labels: pull-request-available > > When Spark SQL cannot recognise ANSI interval, it outputs wrong pattern for > particular ANSI interval. For example, it cannot recognise year-month > interval, but says about day-time interval: > {code:sql} > spark-sql (default)> select interval '-\t2-2\t' year to month; > Interval string does not match year-month format of `[+|-]d h`, `INTERVAL > [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: - 2-2 . > (line 1, pos 16) > == SQL == > select interval '-\t2-2\t' year to month > ^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47060) Check SparkIllegalArgumentException instead of IllegalArgumentException in catalyst
[ https://issues.apache.org/jira/browse/SPARK-47060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-47060. -- Resolution: Fixed Issue resolved by pull request 45118 [https://github.com/apache/spark/pull/45118] > Check SparkIllegalArgumentException instead of IllegalArgumentException in > catalyst > --- > > Key: SPARK-47060 > URL: https://issues.apache.org/jira/browse/SPARK-47060 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Use checkError() to test the SparkIllegalArgumentException exception instead > of IllegalArgumentException in the Catalyst project. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47073) Upgrade `versions-maven-plugin` to 2.16.2
[ https://issues.apache.org/jira/browse/SPARK-47073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47073: --- Labels: pull-request-available (was: ) > Upgrade `versions-maven-plugin` to 2.16.2 > - > > Key: SPARK-47073 > URL: https://issues.apache.org/jira/browse/SPARK-47073 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47071) inline With expression if it contains special expression
[ https://issues.apache.org/jira/browse/SPARK-47071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47071. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45134 [https://github.com/apache/spark/pull/45134] > inline With expression if it contains special expression > > > Key: SPARK-47071 > URL: https://issues.apache.org/jira/browse/SPARK-47071 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47057) Reenable MyPy data test
[ https://issues.apache.org/jira/browse/SPARK-47057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47057: --- Labels: pull-request-available (was: ) > Reenable MyPy data test > --- > > Key: SPARK-47057 > URL: https://issues.apache.org/jira/browse/SPARK-47057 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org