[jira] [Commented] (SPARK-43031) Enable tests for Python streaming spark-connect
[ https://issues.apache.org/jira/browse/SPARK-43031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710401#comment-17710401 ] Hudson commented on SPARK-43031: User 'WweiL' has created a pull request for this issue: https://github.com/apache/spark/pull/40691 > Enable tests for Python streaming spark-connect > --- > > Key: SPARK-43031 > URL: https://issues.apache.org/jira/browse/SPARK-43031 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43093) Test case "Add a directory when spark.sql.legacy.addSingleFileInAddFile set to false" should use random directories for testing
Yang Jie created SPARK-43093: Summary: Test case "Add a directory when spark.sql.legacy.addSingleFileInAddFile set to false" should use random directories for testing Key: SPARK-43093 URL: https://issues.apache.org/jira/browse/SPARK-43093 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.3.2, 3.2.3, 3.4.0, 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43092) Cleanup unsuppoerted function `dropDuplicatesWithinWatermark` from `Dataset`
Yang Jie created SPARK-43092: Summary: Cleanup unsuppoerted function `dropDuplicatesWithinWatermark` from `Dataset` Key: SPARK-43092 URL: https://issues.apache.org/jira/browse/SPARK-43092 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43088) Respect RequiresDistributionAndOrdering in CTAS/RTAS
[ https://issues.apache.org/jira/browse/SPARK-43088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710386#comment-17710386 ] Snoot.io commented on SPARK-43088: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40734 > Respect RequiresDistributionAndOrdering in CTAS/RTAS > > > Key: SPARK-43088 > URL: https://issues.apache.org/jira/browse/SPARK-43088 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > We must respect {{RequiresDistributionAndOrdering}} writes constructed for > CTAS/RTAS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43033) Avoid task retries due to AssertNotNull checks
[ https://issues.apache.org/jira/browse/SPARK-43033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710385#comment-17710385 ] Snoot.io commented on SPARK-43033: -- User 'clownxc' has created a pull request for this issue: https://github.com/apache/spark/pull/40707 > Avoid task retries due to AssertNotNull checks > -- > > Key: SPARK-43033 > URL: https://issues.apache.org/jira/browse/SPARK-43033 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > As discussed > [here|https://github.com/apache/spark/pull/40655#discussion_r1156693696], > tasks that failed because of exceptions generated by {{AssertNotNull}} should > not be retried. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43089) Redact debug string in UI
[ https://issues.apache.org/jira/browse/SPARK-43089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710384#comment-17710384 ] Snoot.io commented on SPARK-43089: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40733 > Redact debug string in UI > - > > Key: SPARK-43089 > URL: https://issues.apache.org/jira/browse/SPARK-43089 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.5.0 > > > https://github.com/apache/spark/pull/40603 exposes all data without > redaction. We should redact it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43089) Redact debug string in UI
[ https://issues.apache.org/jira/browse/SPARK-43089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43089. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40733 [https://github.com/apache/spark/pull/40733] > Redact debug string in UI > - > > Key: SPARK-43089 > URL: https://issues.apache.org/jira/browse/SPARK-43089 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.5.0 > > > https://github.com/apache/spark/pull/40603 exposes all data without > redaction. We should redact it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43089) Redact debug string in UI
[ https://issues.apache.org/jira/browse/SPARK-43089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43089: Assignee: Hyukjin Kwon > Redact debug string in UI > - > > Key: SPARK-43089 > URL: https://issues.apache.org/jira/browse/SPARK-43089 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.1 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > https://github.com/apache/spark/pull/40603 exposes all data without > redaction. We should redact it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42916) JDBCCatalog Keep Char/Varchar meta information on the read-side
[ https://issues.apache.org/jira/browse/SPARK-42916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710382#comment-17710382 ] Snoot.io commented on SPARK-42916: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/40543 > JDBCCatalog Keep Char/Varchar meta information on the read-side > --- > > Key: SPARK-42916 > URL: https://issues.apache.org/jira/browse/SPARK-42916 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > Fix error like: > string cannot be cast to varchar(20) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43039) Support custom fields in the file source _metadata column
[ https://issues.apache.org/jira/browse/SPARK-43039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710381#comment-17710381 ] Snoot.io commented on SPARK-43039: -- User 'ryan-johnson-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/40677 > Support custom fields in the file source _metadata column > - > > Key: SPARK-43039 > URL: https://issues.apache.org/jira/browse/SPARK-43039 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > Today, the schema of the file source _metadata column depends on the file > format (e.g. parquet file format supports {{{}_metadata.row_index{}}}) but > this is hard-wired into the {{FileFormat}} itself. Not only is this an ugly > design, it also prevents custom file formats from adding their own fields to > the {{_metadata}} column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43077) Improve the error message of UNRECOGNIZED_SQL_TYPE
[ https://issues.apache.org/jira/browse/SPARK-43077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-43077. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40718 [https://github.com/apache/spark/pull/40718] > Improve the error message of UNRECOGNIZED_SQL_TYPE > -- > > Key: SPARK-43077 > URL: https://issues.apache.org/jira/browse/SPARK-43077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0 > > > UNRECOGNIZED_SQL_TYPE prints the jdbc type id in the error message currently. > This is difficult for spark users to understand the meaning of this kind of > error, especially when the type id is from a vendor extension. > For example, > {code:java} > org.apache.spark.SparkSQLException: Unrecognized SQL type -102{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43077) Improve the error message of UNRECOGNIZED_SQL_TYPE
[ https://issues.apache.org/jira/browse/SPARK-43077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-43077: Assignee: Kent Yao > Improve the error message of UNRECOGNIZED_SQL_TYPE > -- > > Key: SPARK-43077 > URL: https://issues.apache.org/jira/browse/SPARK-43077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > UNRECOGNIZED_SQL_TYPE prints the jdbc type id in the error message currently. > This is difficult for spark users to understand the meaning of this kind of > error, especially when the type id is from a vendor extension. > For example, > {code:java} > org.apache.spark.SparkSQLException: Unrecognized SQL type -102{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43091) Support overloading UDF
Hang Wu created SPARK-43091: --- Summary: Support overloading UDF Key: SPARK-43091 URL: https://issues.apache.org/jira/browse/SPARK-43091 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.2 Reporter: Hang Wu It seems that Spark SQL does not support overloading UDF for a long while. If we register two functions with the same name, Spark complains "The function is replaced a previously registered function". The solution is to either enhancing the org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry class to support multiple functions with the same name, or enable users to extend and use their own FunctionRegistry class. Should you have any comment, please kindly let me know. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43090) Move withTable from RemoteSparkSession to SQLHelper
[ https://issues.apache.org/jira/browse/SPARK-43090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43090. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40723 [https://github.com/apache/spark/pull/40723] > Move withTable from RemoteSparkSession to SQLHelper > --- > > Key: SPARK-43090 > URL: https://issues.apache.org/jira/browse/SPARK-43090 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43090) Move withTable from RemoteSparkSession to SQLHelper
[ https://issues.apache.org/jira/browse/SPARK-43090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43090: - Assignee: Yang Jie > Move withTable from RemoteSparkSession to SQLHelper > --- > > Key: SPARK-43090 > URL: https://issues.apache.org/jira/browse/SPARK-43090 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43090) Move withTable from RemoteSparkSession to SQLHelper
Yang Jie created SPARK-43090: Summary: Move withTable from RemoteSparkSession to SQLHelper Key: SPARK-43090 URL: https://issues.apache.org/jira/browse/SPARK-43090 Project: Spark Issue Type: Improvement Components: Connect, Tests Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43089) Redact debug string in UI
[ https://issues.apache.org/jira/browse/SPARK-43089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-43089: - Affects Version/s: 3.4.1 (was: 3.4.0) > Redact debug string in UI > - > > Key: SPARK-43089 > URL: https://issues.apache.org/jira/browse/SPARK-43089 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.1 >Reporter: Hyukjin Kwon >Priority: Major > > https://github.com/apache/spark/pull/40603 exposes all data without > redaction. We should redact it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43089) Redact debug string in UI
Hyukjin Kwon created SPARK-43089: Summary: Redact debug string in UI Key: SPARK-43089 URL: https://issues.apache.org/jira/browse/SPARK-43089 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/pull/40603 exposes all data without redaction. We should redact it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43088) Respect RequiresDistributionAndOrdering in CTAS/RTAS
Anton Okolnychyi created SPARK-43088: Summary: Respect RequiresDistributionAndOrdering in CTAS/RTAS Key: SPARK-43088 URL: https://issues.apache.org/jira/browse/SPARK-43088 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Anton Okolnychyi We must respect {{RequiresDistributionAndOrdering}} writes constructed for CTAS/RTAS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43085) Fix bug in column DEFAULT assignment for target tables with multi-part names
[ https://issues.apache.org/jira/browse/SPARK-43085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel updated SPARK-43085: --- Summary: Fix bug in column DEFAULT assignment for target tables with multi-part names (was: Fix bug in column DEFAULT assignment for target tables with three-part names) > Fix bug in column DEFAULT assignment for target tables with multi-part names > > > Key: SPARK-43085 > URL: https://issues.apache.org/jira/browse/SPARK-43085 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > > To reproduce: > {{CREATE DATABASE If NOT EXISTS main.codydemos;}} > {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} > {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} > {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} > {{SELECT * FROM main.codydemos.test_s}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43085) Fix bug in column DEFAULT assignment for target tables with multi-part names
[ https://issues.apache.org/jira/browse/SPARK-43085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel updated SPARK-43085: --- Description: (was: To reproduce: {{CREATE DATABASE If NOT EXISTS main.codydemos;}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} {{SELECT * FROM main.codydemos.test_s}} ) > Fix bug in column DEFAULT assignment for target tables with multi-part names > > > Key: SPARK-43085 > URL: https://issues.apache.org/jira/browse/SPARK-43085 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.6
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710355#comment-17710355 ] Mike K commented on SPARK-42382: User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40726 > Upgrade `cyclonedx-maven-plugin` to 2.7.6 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()
[ https://issues.apache.org/jira/browse/SPARK-42951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42951: Assignee: Wei Liu > Spark Connect: Streaming DataStreamReader API except table() > > > Key: SPARK-42951 > URL: https://issues.apache.org/jira/browse/SPARK-42951 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42951) Spark Connect: Streaming DataStreamReader API except table()
[ https://issues.apache.org/jira/browse/SPARK-42951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42951. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40689 [https://github.com/apache/spark/pull/40689] > Spark Connect: Streaming DataStreamReader API except table() > > > Key: SPARK-42951 > URL: https://issues.apache.org/jira/browse/SPARK-42951 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.6
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42382. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40726 [https://github.com/apache/spark/pull/40726] > Upgrade `cyclonedx-maven-plugin` to 2.7.6 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43083) Mark `*StateStoreSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-43083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43083: -- Affects Version/s: 3.4.0 (was: 3.5.0) > Mark `*StateStoreSuite` as `ExtendedSQLTest` > > > Key: SPARK-43083 > URL: https://issues.apache.org/jira/browse/SPARK-43083 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43083) Mark `*StateStoreSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-43083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43083: -- Fix Version/s: 3.4.1 (was: 3.4.0) > Mark `*StateStoreSuite` as `ExtendedSQLTest` > > > Key: SPARK-43083 > URL: https://issues.apache.org/jira/browse/SPARK-43083 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43083) Mark `*StateStoreSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-43083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43083. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40727 [https://github.com/apache/spark/pull/40727] > Mark `*StateStoreSuite` as `ExtendedSQLTest` > > > Key: SPARK-43083 > URL: https://issues.apache.org/jira/browse/SPARK-43083 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43083) Mark `*StateStoreSuite` as `ExtendedSQLTest`
[ https://issues.apache.org/jira/browse/SPARK-43083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43083: - Assignee: Dongjoon Hyun > Mark `*StateStoreSuite` as `ExtendedSQLTest` > > > Key: SPARK-43083 > URL: https://issues.apache.org/jira/browse/SPARK-43083 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43087) Support coalesce buckets in join in AQE
Yuming Wang created SPARK-43087: --- Summary: Support coalesce buckets in join in AQE Key: SPARK-43087 URL: https://issues.apache.org/jira/browse/SPARK-43087 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43071) Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation
[ https://issues.apache.org/jira/browse/SPARK-43071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-43071. Fix Version/s: 3.4.1 Resolution: Fixed Issue resolved by pull request 40710 [https://github.com/apache/spark/pull/40710] > Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation > -- > > Key: SPARK-43071 > URL: https://issues.apache.org/jira/browse/SPARK-43071 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43071) Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation
[ https://issues.apache.org/jira/browse/SPARK-43071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-43071: -- Assignee: Daniel > Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation > -- > > Key: SPARK-43071 > URL: https://issues.apache.org/jira/browse/SPARK-43071 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Daniel >Assignee: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43086) Support bin pack task scheduling on executors
Zhongwei Zhu created SPARK-43086: Summary: Support bin pack task scheduling on executors Key: SPARK-43086 URL: https://issues.apache.org/jira/browse/SPARK-43086 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.2 Reporter: Zhongwei Zhu Dynamic allocation only remove or decommission an idle executor. The default task scheduling use round robin to do task assignment on executors. For example, we have 4 tasks to run, 4 executors(each has 4 cpu cores). Default task scheduling will assign 1 task per executors. With bin pack, one executor could assign 4 tasks, then dynamic allocation could remove other 3 executors to reduce resource waste. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43085) Fix bug in column DEFAULT assignment for target tables with three-part names
[ https://issues.apache.org/jira/browse/SPARK-43085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel updated SPARK-43085: --- Description: To reproduce: {{CREATE DATABASE If NOT EXISTS main.codydemos;}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} {{SELECT * FROM main.codydemos.test_s}} was: To reproduce: {{CREATE DATABASE If NOT EXISTS main.codydemos;}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} {{SELECT * FROM main.codydemos.test_s}} > Fix bug in column DEFAULT assignment for target tables with three-part names > > > Key: SPARK-43085 > URL: https://issues.apache.org/jira/browse/SPARK-43085 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > > To reproduce: > {{CREATE DATABASE If NOT EXISTS main.codydemos;}} > {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} > {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} > {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} > {{SELECT * FROM main.codydemos.test_s}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43085) Fix bug in column DEFAULT assignment for target tables with three-part names
Daniel created SPARK-43085: -- Summary: Fix bug in column DEFAULT assignment for target tables with three-part names Key: SPARK-43085 URL: https://issues.apache.org/jira/browse/SPARK-43085 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Daniel To reproduce: ``` {{CREATE DATABASE If NOT EXISTS main.codydemos;}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} {{SELECT * FROM main.codydemos.test_s}} ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43085) Fix bug in column DEFAULT assignment for target tables with three-part names
[ https://issues.apache.org/jira/browse/SPARK-43085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel updated SPARK-43085: --- Description: To reproduce: {{CREATE DATABASE If NOT EXISTS main.codydemos;}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} {{SELECT * FROM main.codydemos.test_s}} was: To reproduce: ``` {{CREATE DATABASE If NOT EXISTS main.codydemos;}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} {{SELECT * FROM main.codydemos.test_s}} ``` > Fix bug in column DEFAULT assignment for target tables with three-part names > > > Key: SPARK-43085 > URL: https://issues.apache.org/jira/browse/SPARK-43085 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > > To reproduce: > > {{CREATE DATABASE If NOT EXISTS main.codydemos;}} > {{CREATE OR REPLACE TABLE main.codydemos.test_ts (Id INT, ts timestamp);}} > {{CREATE OR REPLACE TABLE main.codydemos.test_ts_other (ts timestamp);}} > {{INSERT INTO main.codydemos.test_ts(ts) VALUES (current_timestamp());}} > {{SELECT * FROM main.codydemos.test_s}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43084) Add Python state API (applyInPandasWithState) and verify UDFs
Raghu Angadi created SPARK-43084: Summary: Add Python state API (applyInPandasWithState) and verify UDFs Key: SPARK-43084 URL: https://issues.apache.org/jira/browse/SPARK-43084 Project: Spark Issue Type: Task Components: Connect, Structured Streaming Affects Versions: 3.5.0 Environment: * Add Python state API (applyInPandasWithState) to streaming Spark-connect. * verify the UDFs work (it may not need any code changes). Reporter: Raghu Angadi -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43083) Mark `*StateStoreSuite` as `ExtendedSQLTest`
Dongjoon Hyun created SPARK-43083: - Summary: Mark `*StateStoreSuite` as `ExtendedSQLTest` Key: SPARK-43083 URL: https://issues.apache.org/jira/browse/SPARK-43083 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.6
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42382: -- Summary: Upgrade `cyclonedx-maven-plugin` to 2.7.6 (was: Upgrade `cyclonedx-maven-plugin` to 2.7.5) > Upgrade `cyclonedx-maven-plugin` to 2.7.6 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710273#comment-17710273 ] Dongjoon Hyun commented on SPARK-42382: --- Since [~LuciferYang] has been investigating this so far, I made a PR with his main-authorship. - https://github.com/apache/spark/pull/40726 > Upgrade `cyclonedx-maven-plugin` to 2.7.5 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-42382: --- Assignee: Yang Jie > Upgrade `cyclonedx-maven-plugin` to 2.7.5 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.5
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710270#comment-17710270 ] Dongjoon Hyun commented on SPARK-42382: --- Shall we reopen this because 2.7.6 is released one week ago? > Upgrade `cyclonedx-maven-plugin` to 2.7.5 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4] > [https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.5] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43082) Arrow-optimized Python UDFs in Spark Connect
Xinrong Meng created SPARK-43082: Summary: Arrow-optimized Python UDFs in Spark Connect Key: SPARK-43082 URL: https://issues.apache.org/jira/browse/SPARK-43082 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Xinrong Meng Implement Arrow-optimized Python UDFs in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43061) Introduce PartitionEvaluator for SQL operator execution
[ https://issues.apache.org/jira/browse/SPARK-43061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-43061: Summary: Introduce PartitionEvaluator for SQL operator execution (was: Introduce TaskEvaluator for SQL operator execution) > Introduce PartitionEvaluator for SQL operator execution > --- > > Key: SPARK-43061 > URL: https://issues.apache.org/jira/browse/SPARK-43061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43061) Introduce TaskEvaluator for SQL operator execution
[ https://issues.apache.org/jira/browse/SPARK-43061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43061. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40697 [https://github.com/apache/spark/pull/40697] > Introduce TaskEvaluator for SQL operator execution > -- > > Key: SPARK-43061 > URL: https://issues.apache.org/jira/browse/SPARK-43061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43061) Introduce TaskEvaluator for SQL operator execution
[ https://issues.apache.org/jira/browse/SPARK-43061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43061: --- Assignee: Wenchen Fan > Introduce TaskEvaluator for SQL operator execution > -- > > Key: SPARK-43061 > URL: https://issues.apache.org/jira/browse/SPARK-43061 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43033) Avoid task retries due to AssertNotNull checks
[ https://issues.apache.org/jira/browse/SPARK-43033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710174#comment-17710174 ] xiaochen zhou commented on SPARK-43033: --- Thanks for your reply. I have opened the PR, please help to review it, thanks a lot. [https://github.com/apache/spark/pull/40707] > Avoid task retries due to AssertNotNull checks > -- > > Key: SPARK-43033 > URL: https://issues.apache.org/jira/browse/SPARK-43033 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > As discussed > [here|https://github.com/apache/spark/pull/40655#discussion_r1156693696], > tasks that failed because of exceptions generated by {{AssertNotNull}} should > not be retried. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43063) `df.show` handle null should print NULL instead of null
[ https://issues.apache.org/jira/browse/SPARK-43063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710133#comment-17710133 ] GridGain Integration commented on SPARK-43063: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/40699 > `df.show` handle null should print NULL instead of null > --- > > Key: SPARK-43063 > URL: https://issues.apache.org/jira/browse/SPARK-43063 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: yikaifei >Priority: Trivial > > `df.show` handle null should print NULL instead of null to consistent > behavior; > {code:java} > Like as the following behavior is currently inconsistent: > ``` shell > scala> spark.sql("select decode(6, 1, 'Southlake', 2, 'San Francisco', 3, > 'New Jersey', 4, 'Seattle') as result").show(false) > +--+ > |result| > +--+ > |null | > +--+ > ``` > ``` shell > spark-sql> DESC FUNCTION EXTENDED decode; > function_desc > Function: decode > Class: org.apache.spark.sql.catalyst.expressions.Decode > Usage: > decode(bin, charset) - Decodes the first argument using the second > argument character set. > decode(expr, search, result [, search, result ] ... [, default]) - > Compares expr > to each search value in order. If expr is equal to a search value, > decode returns > the corresponding result. If no match is found, then it returns > default. If default > is omitted, it returns null. > Extended Usage: > Examples: > > SELECT decode(encode('abc', 'utf-8'), 'utf-8'); >abc > > SELECT decode(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', > 4, 'Seattle', 'Non domestic'); >San Francisco > > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', > 4, 'Seattle', 'Non domestic'); >Non domestic > > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', > 4, 'Seattle'); >NULL > Since: 3.2.0 > Time taken: 0.074 seconds, Fetched 4 row(s) > ``` > ``` shell > spark-sql> select decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New > Jersey', 4, 'Seattle'); > NULL > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data
[ https://issues.apache.org/jira/browse/SPARK-43081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710127#comment-17710127 ] Ignite TC Bot commented on SPARK-43081: --- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/40724 > Add torch distributor data loader that loads data from spark partition data > --- > > Key: SPARK-43081 > URL: https://issues.apache.org/jira/browse/SPARK-43081 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Add torch distributor data loader that loads data from spark partition data. > > We can add 2 APIs like: > Adds a `TorchDistributor` method API : > {code:java} > def train_on_dataframe(self, train_function, spark_dataframe, *args, > **kwargs): > """ > Runs distributed training using provided spark DataFrame as input > data. > You should ensure the input spark DataFrame have evenly divided > partitions, > and this method starts a barrier spark job that each spark task in > the job > process one partition of the input spark DataFrame. > Parameters > -- > train_function : > Either a PyTorch function, PyTorch Lightning function that > launches distributed > training. Note that inside the function, you can call > `pyspark.ml.torch.distributor.get_spark_partition_data_loader` > API to get a torch > data loader, the data loader loads data from the corresponding > partition of the > input spark DataFrame. > spark_dataframe : > An input spark DataFrame that can be used in PyTorch > `train_function` function. > See `train_function` argument doc for details. > args : > `args` need to be the input parameters to `train_function` > function. It would look like > >>> model = distributor.run(train, 1e-3, 64) > where train is a function and 1e-3 and 64 are regular numeric > inputs to the function. > kwargs : > `kwargs` need to be the key-work input parameters to > `train_function` function. > It would look like > >>> model = distributor.run(train, tol=1e-3, max_iter=64) > where train is a function that has 2 arguments `tol` and > `max_iter`. > Returns > --- > Returns the output of `train_function` called with args inside > spark rank 0 task. > """{code} > > Adds an loader API: > > {code:java} > def get_spark_partition_data_loader(num_samples, batch_size, prefetch=2): > """ > This function must be called inside the `train_function` where > `train_function` > is the input argument of `TorchDistributor.train_on_dataframe`. > The function returns a pytorch data loader that loads data from > the corresponding spark partition data. > Parameters > -- > num_samples : > Number of samples to generate per epoch. If `num_samples` is less > than the number of > rows in the spark partition, it generate the first `num_samples` rows > of > the spark partition, if `num_samples` is greater than the number of > rows in the spark partition, then after the iterator loaded all rows > from the partition, > it wraps round back to the first row. > batch_size: > How many samples per batch to load. > prefetch: > Number of batches loaded in advance. > """{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data
[ https://issues.apache.org/jira/browse/SPARK-43081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-43081: -- Assignee: Weichen Xu > Add torch distributor data loader that loads data from spark partition data > --- > > Key: SPARK-43081 > URL: https://issues.apache.org/jira/browse/SPARK-43081 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Add torch distributor data loader that loads data from spark partition data. > > We can add 2 APIs like: > Adds a `TorchDistributor` method API : > {code:java} > def train_on_dataframe(self, train_function, spark_dataframe, *args, > **kwargs): > """ > Runs distributed training using provided spark DataFrame as input > data. > You should ensure the input spark DataFrame have evenly divided > partitions, > and this method starts a barrier spark job that each spark task in > the job > process one partition of the input spark DataFrame. > Parameters > -- > train_function : > Either a PyTorch function, PyTorch Lightning function that > launches distributed > training. Note that inside the function, you can call > `pyspark.ml.torch.distributor.get_spark_partition_data_loader` > API to get a torch > data loader, the data loader loads data from the corresponding > partition of the > input spark DataFrame. > spark_dataframe : > An input spark DataFrame that can be used in PyTorch > `train_function` function. > See `train_function` argument doc for details. > args : > `args` need to be the input parameters to `train_function` > function. It would look like > >>> model = distributor.run(train, 1e-3, 64) > where train is a function and 1e-3 and 64 are regular numeric > inputs to the function. > kwargs : > `kwargs` need to be the key-work input parameters to > `train_function` function. > It would look like > >>> model = distributor.run(train, tol=1e-3, max_iter=64) > where train is a function that has 2 arguments `tol` and > `max_iter`. > Returns > --- > Returns the output of `train_function` called with args inside > spark rank 0 task. > """{code} > > Adds an loader API: > > {code:java} > def get_spark_partition_data_loader(num_samples, batch_size, prefetch=2): > """ > This function must be called inside the `train_function` where > `train_function` > is the input argument of `TorchDistributor.train_on_dataframe`. > The function returns a pytorch data loader that loads data from > the corresponding spark partition data. > Parameters > -- > num_samples : > Number of samples to generate per epoch. If `num_samples` is less > than the number of > rows in the spark partition, it generate the first `num_samples` rows > of > the spark partition, if `num_samples` is greater than the number of > rows in the spark partition, then after the iterator loaded all rows > from the partition, > it wraps round back to the first row. > batch_size: > How many samples per batch to load. > prefetch: > Number of batches loaded in advance. > """{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data
[ https://issues.apache.org/jira/browse/SPARK-43081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-43081: --- Description: Add torch distributor data loader that loads data from spark partition data. We can add 2 APIs like: Adds a `TorchDistributor` method API : ``` def train_on_dataframe(self, train_function, spark_dataframe, *args, **kwargs): """ Runs distributed training using provided spark DataFrame as input data. You should ensure the input spark DataFrame have evenly divided partitions, and this method starts a barrier spark job that each spark task in the job process one partition of the input spark DataFrame. Parameters -- train_function : Either a PyTorch function, PyTorch Lightning function that launches distributed training. Note that inside the function, you can call `pyspark.ml.torch.distributor.get_spark_partition_data_loader` API to get a torch data loader, the data loader loads data from the corresponding partition of the input spark DataFrame. spark_dataframe : An input spark DataFrame that can be used in PyTorch `train_function` function. See `train_function` argument doc for details. args : `args` need to be the input parameters to `train_function` function. It would look like >>> model = distributor.run(train, 1e-3, 64) where train is a function and 1e-3 and 64 are regular numeric inputs to the function. kwargs : `kwargs` need to be the key-work input parameters to `train_function` function. It would look like >>> model = distributor.run(train, tol=1e-3, max_iter=64) where train is a function that has 2 arguments `tol` and `max_iter`. Returns --- Returns the output of `train_function` called with args inside spark rank 0 task. """ ``` Adds an loader API: ``` def get_spark_partition_data_loader(num_samples, batch_size, prefetch=2): """ This function must be called inside the `train_function` where `train_function` is the input argument of `TorchDistributor.train_on_dataframe`. The function returns a pytorch data loader that loads data from the corresponding spark partition data. Parameters -- num_samples : Number of samples to generate per epoch. If `num_samples` is less than the number of rows in the spark partition, it generate the first `num_samples` rows of the spark partition, if `num_samples` is greater than the number of rows in the spark partition, then after the iterator loaded all rows from the partition, it wraps round back to the first row. batch_size: How many samples per batch to load. prefetch: Number of batches loaded in advance. """ ``` was:Add torch distributor data loader that loads data from spark partition data. > Add torch distributor data loader that loads data from spark partition data > --- > > Key: SPARK-43081 > URL: https://issues.apache.org/jira/browse/SPARK-43081 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Priority: Major > > Add torch distributor data loader that loads data from spark partition data. > > We can add 2 APIs like: > > Adds a `TorchDistributor` method API : > ``` > def train_on_dataframe(self, train_function, spark_dataframe, *args, > **kwargs): > """ > Runs distributed training using provided spark DataFrame as input > data. > You should ensure the input spark DataFrame have evenly divided > partitions, > and this method starts a barrier spark job that each spark task in > the job > process one partition of the input spark DataFrame. > Parameters > -- > train_function : > Either a PyTorch function, PyTorch Lightning function that > launches distributed > training. Note that inside the function, you can call > `pyspark.ml.torch.distributor.get_spark_partition_data_loader` > API to get a torch > data loader, the data loader loads data from the corresponding > partition of the > input spark DataFrame. > spark_dataframe : > An input spark DataFrame that can be used in PyTorch > `train_function` function. > See `train_function` argument doc for details. > args : > `args` need to be the input parameters to `train_function` > function. It would look like > >>> model =
[jira] [Updated] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data
[ https://issues.apache.org/jira/browse/SPARK-43081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-43081: --- Description: Add torch distributor data loader that loads data from spark partition data. We can add 2 APIs like: Adds a `TorchDistributor` method API : {code:java} def train_on_dataframe(self, train_function, spark_dataframe, *args, **kwargs): """ Runs distributed training using provided spark DataFrame as input data. You should ensure the input spark DataFrame have evenly divided partitions, and this method starts a barrier spark job that each spark task in the job process one partition of the input spark DataFrame. Parameters -- train_function : Either a PyTorch function, PyTorch Lightning function that launches distributed training. Note that inside the function, you can call `pyspark.ml.torch.distributor.get_spark_partition_data_loader` API to get a torch data loader, the data loader loads data from the corresponding partition of the input spark DataFrame. spark_dataframe : An input spark DataFrame that can be used in PyTorch `train_function` function. See `train_function` argument doc for details. args : `args` need to be the input parameters to `train_function` function. It would look like >>> model = distributor.run(train, 1e-3, 64) where train is a function and 1e-3 and 64 are regular numeric inputs to the function. kwargs : `kwargs` need to be the key-work input parameters to `train_function` function. It would look like >>> model = distributor.run(train, tol=1e-3, max_iter=64) where train is a function that has 2 arguments `tol` and `max_iter`. Returns --- Returns the output of `train_function` called with args inside spark rank 0 task. """{code} Adds an loader API: {code:java} def get_spark_partition_data_loader(num_samples, batch_size, prefetch=2): """ This function must be called inside the `train_function` where `train_function` is the input argument of `TorchDistributor.train_on_dataframe`. The function returns a pytorch data loader that loads data from the corresponding spark partition data. Parameters -- num_samples : Number of samples to generate per epoch. If `num_samples` is less than the number of rows in the spark partition, it generate the first `num_samples` rows of the spark partition, if `num_samples` is greater than the number of rows in the spark partition, then after the iterator loaded all rows from the partition, it wraps round back to the first row. batch_size: How many samples per batch to load. prefetch: Number of batches loaded in advance. """{code} was: Add torch distributor data loader that loads data from spark partition data. We can add 2 APIs like: Adds a `TorchDistributor` method API : ``` def train_on_dataframe(self, train_function, spark_dataframe, *args, **kwargs): """ Runs distributed training using provided spark DataFrame as input data. You should ensure the input spark DataFrame have evenly divided partitions, and this method starts a barrier spark job that each spark task in the job process one partition of the input spark DataFrame. Parameters -- train_function : Either a PyTorch function, PyTorch Lightning function that launches distributed training. Note that inside the function, you can call `pyspark.ml.torch.distributor.get_spark_partition_data_loader` API to get a torch data loader, the data loader loads data from the corresponding partition of the input spark DataFrame. spark_dataframe : An input spark DataFrame that can be used in PyTorch `train_function` function. See `train_function` argument doc for details. args : `args` need to be the input parameters to `train_function` function. It would look like >>> model = distributor.run(train, 1e-3, 64) where train is a function and 1e-3 and 64 are regular numeric inputs to the function. kwargs : `kwargs` need to be the key-work input parameters to `train_function` function. It would look like >>> model = distributor.run(train, tol=1e-3, max_iter=64) where train is a function that has 2 arguments `tol` and `max_iter`. Returns --- Returns the output of `train_function` called with args inside spark rank 0 task. """ ``` Adds an loader API:
[jira] [Created] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data
Weichen Xu created SPARK-43081: -- Summary: Add torch distributor data loader that loads data from spark partition data Key: SPARK-43081 URL: https://issues.apache.org/jira/browse/SPARK-43081 Project: Spark Issue Type: Sub-task Components: Connect, ML, PySpark Affects Versions: 3.5.0 Reporter: Weichen Xu Add torch distributor data loader that loads data from spark partition data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43077) Improve the error message of UNRECOGNIZED_SQL_TYPE
[ https://issues.apache.org/jira/browse/SPARK-43077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710093#comment-17710093 ] ASF GitHub Bot commented on SPARK-43077: User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/40718 > Improve the error message of UNRECOGNIZED_SQL_TYPE > -- > > Key: SPARK-43077 > URL: https://issues.apache.org/jira/browse/SPARK-43077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Priority: Major > > UNRECOGNIZED_SQL_TYPE prints the jdbc type id in the error message currently. > This is difficult for spark users to understand the meaning of this kind of > error, especially when the type id is from a vendor extension. > For example, > {code:java} > org.apache.spark.SparkSQLException: Unrecognized SQL type -102{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43080) Upgrade zstd-jni to 1.5.5-1
[ https://issues.apache.org/jira/browse/SPARK-43080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710091#comment-17710091 ] ASF GitHub Bot commented on SPARK-43080: User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40721 > Upgrade zstd-jni to 1.5.5-1 > --- > > Key: SPARK-43080 > URL: https://issues.apache.org/jira/browse/SPARK-43080 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * > [luben/zstd-jni@{{{}v1.5.4-2...v1.5.5-1{}}}|https://github.com/luben/zstd-jni/compare/v1.5.4-2...v1.5.5-1] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43080) Upgrade zstd-jni to 1.5.5-1
Yang Jie created SPARK-43080: Summary: Upgrade zstd-jni to 1.5.5-1 Key: SPARK-43080 URL: https://issues.apache.org/jira/browse/SPARK-43080 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie * [luben/zstd-jni@{{{}v1.5.4-2...v1.5.5-1{}}}|https://github.com/luben/zstd-jni/compare/v1.5.4-2...v1.5.5-1] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43076) Removing the dependency on `grpcio` when remote session is not used.
[ https://issues.apache.org/jira/browse/SPARK-43076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710083#comment-17710083 ] GridGain Integration commented on SPARK-43076: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40722 > Removing the dependency on `grpcio` when remote session is not used. > > > Key: SPARK-43076 > URL: https://issues.apache.org/jira/browse/SPARK-43076 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should not enforce to install `grpcio` when remote session is not used for > pandas API on Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40609) Casts types according to bucket info for Equality expression
[ https://issues.apache.org/jira/browse/SPARK-40609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710080#comment-17710080 ] Yuming Wang commented on SPARK-40609: - {code:scala} import org.apache.spark.benchmark.Benchmark val numRows = 1024 * 1024 * 40 spark.sql(s"CREATE TABLE t using parquet AS SELECT id as a, cast(id as decimal(18, 0)) as b FROM range(${numRows}L)") val benchmark = new Benchmark("Benchmark equal with cast", numRows, minNumIters = 2) benchmark.addCase("default") { _ => spark.sql("SELECT * FROM t t1 join t t2 on t1.a = t2.b").write.format("noop").mode("Overwrite").save() } benchmark.addCase("cast to bigint") { _ => spark.sql("SELECT * FROM t t1 join t t2 on cast(t1.a as bigint) = cast(t2.b as bigint)").write.format("noop").mode("Overwrite").save() } benchmark.addCase("cast to decimal") { _ => spark.sql("SELECT * FROM t t1 join t t2 on cast(t1.a as decimal(18, 0)) = cast(t2.b as decimal(18, 0))").write.format("noop").mode("Overwrite").save() } benchmark.run() {code} {noformat} OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Mac OS X 13.2.1 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark equal with cast: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative default 34594 35381 1113 1.2 824.8 1.0X cast to bigint29056 29367 440 1.4 692.7 1.2X cast to decimal 32528 33081 783 1.3 775.5 1.1X {noformat} > Casts types according to bucket info for Equality expression > > > Key: SPARK-40609 > URL: https://issues.apache.org/jira/browse/SPARK-40609 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43079) Add bloom filter details in spark history server plans/SVGs
Rajesh Balamohan created SPARK-43079: Summary: Add bloom filter details in spark history server plans/SVGs Key: SPARK-43079 URL: https://issues.apache.org/jira/browse/SPARK-43079 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.2 Reporter: Rajesh Balamohan Spark bloom filter can be enabled via "spark.sql.optimizer.runtimeFilter.semiJoinReduction.enabled=true and spark.sql.optimizer.runtime.bloomFilter.enabled=true". Spark history server's SVG doesn't render the bloom filter details; It will be good to include this detail in the plan. (as of now, it shows up explain plan's text output). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43078) Separate test into `pyspark-conenct-pandas` and `pyspark-connect-pandas-slow`
Haejoon Lee created SPARK-43078: --- Summary: Separate test into `pyspark-conenct-pandas` and `pyspark-connect-pandas-slow` Key: SPARK-43078 URL: https://issues.apache.org/jira/browse/SPARK-43078 Project: Spark Issue Type: Sub-task Components: Connect, Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee The test `pyspark-connect` takes 2~3 hours due to recently added pandas API on Spark tests, so we'd better to separate pandas API on Spark tests into different test module to reduce the overhead onto `pyspark-connect`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43065) Set job description for tpcds queries
[ https://issues.apache.org/jira/browse/SPARK-43065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43065: Assignee: caican > Set job description for tpcds queries > - > > Key: SPARK-43065 > URL: https://issues.apache.org/jira/browse/SPARK-43065 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: caican >Assignee: caican >Priority: Major > > > When using Spark's TPCDSQueryBenchmark to run tpcds, the spark ui does not > display the sql information > !https://user-images.githubusercontent.com/94670132/230567550-9bb2842c-aecc-41a5-acb6-0ff8ea765df1.png|width=1694,height=523! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43065) Set job description for tpcds queries
[ https://issues.apache.org/jira/browse/SPARK-43065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43065. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40700 [https://github.com/apache/spark/pull/40700] > Set job description for tpcds queries > - > > Key: SPARK-43065 > URL: https://issues.apache.org/jira/browse/SPARK-43065 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: caican >Assignee: caican >Priority: Major > Fix For: 3.5.0 > > > > When using Spark's TPCDSQueryBenchmark to run tpcds, the spark ui does not > display the sql information > !https://user-images.githubusercontent.com/94670132/230567550-9bb2842c-aecc-41a5-acb6-0ff8ea765df1.png|width=1694,height=523! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43057) Migrate Spark Connect Column errors into error class
[ https://issues.apache.org/jira/browse/SPARK-43057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43057. -- Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/40694 > Migrate Spark Connect Column errors into error class > > > Key: SPARK-43057 > URL: https://issues.apache.org/jira/browse/SPARK-43057 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Migrate Spark Connect Column errors into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43059) Migrate TypeError from DataFrame(Reader|Writer) into error class
[ https://issues.apache.org/jira/browse/SPARK-43059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43059. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40706 [https://github.com/apache/spark/pull/40706] > Migrate TypeError from DataFrame(Reader|Writer) into error class > > > Key: SPARK-43059 > URL: https://issues.apache.org/jira/browse/SPARK-43059 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Migrate TypeError from DataFrame(Reader|Writer) into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43059) Migrate TypeError from DataFrame(Reader|Writer) into error class
[ https://issues.apache.org/jira/browse/SPARK-43059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43059: Assignee: Haejoon Lee > Migrate TypeError from DataFrame(Reader|Writer) into error class > > > Key: SPARK-43059 > URL: https://issues.apache.org/jira/browse/SPARK-43059 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Migrate TypeError from DataFrame(Reader|Writer) into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org