[jira] [Created] (SPARK-43248) Unnecessary serialize/deserialize of Path on parallel gather partition stats
Cheng Pan created SPARK-43248: - Summary: Unnecessary serialize/deserialize of Path on parallel gather partition stats Key: SPARK-43248 URL: https://issues.apache.org/jira/browse/SPARK-43248 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43229) Introduce Barrier Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-43229: -- Summary: Introduce Barrier Python UDF (was: Support Barrier Python UDF) > Introduce Barrier Python UDF > > > Key: SPARK-43229 > URL: https://issues.apache.org/jira/browse/SPARK-43229 > Project: Spark > Issue Type: New Feature > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43249) df.sql() should send metrics back()
Martin Grund created SPARK-43249: Summary: df.sql() should send metrics back() Key: SPARK-43249 URL: https://issues.apache.org/jira/browse/SPARK-43249 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.0 Reporter: Martin Grund df.sql() does not return the metrics to the client when executed as a command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42317) Assign name to _LEGACY_ERROR_TEMP_2247
[ https://issues.apache.org/jira/browse/SPARK-42317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42317. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40810 [https://github.com/apache/spark/pull/40810] > Assign name to _LEGACY_ERROR_TEMP_2247 > -- > > Key: SPARK-42317 > URL: https://issues.apache.org/jira/browse/SPARK-42317 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Koray Beyaz >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42317) Assign name to _LEGACY_ERROR_TEMP_2247
[ https://issues.apache.org/jira/browse/SPARK-42317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42317: Assignee: Koray Beyaz > Assign name to _LEGACY_ERROR_TEMP_2247 > -- > > Key: SPARK-42317 > URL: https://issues.apache.org/jira/browse/SPARK-42317 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Koray Beyaz >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43249) df.sql() should send metrics back()
[ https://issues.apache.org/jira/browse/SPARK-43249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43249: - Assignee: Martin Grund > df.sql() should send metrics back() > --- > > Key: SPARK-43249 > URL: https://issues.apache.org/jira/browse/SPARK-43249 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > > df.sql() does not return the metrics to the client when executed as a command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43249) df.sql() should send metrics back()
[ https://issues.apache.org/jira/browse/SPARK-43249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43249. --- Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed Issue resolved by pull request 40899 [https://github.com/apache/spark/pull/40899] > df.sql() should send metrics back() > --- > > Key: SPARK-43249 > URL: https://issues.apache.org/jira/browse/SPARK-43249 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.5.0, 3.4.1 > > > df.sql() does not return the metrics to the client when executed as a command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014
[ https://issues.apache.org/jira/browse/SPARK-43250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43250: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2014 > > > Key: SPARK-43250 > URL: https://issues.apache.org/jira/browse/SPARK-43250 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014
Max Gekk created SPARK-43250: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2014 Key: SPARK-43250 URL: https://issues.apache.org/jira/browse/SPARK-43250 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43252) Assign a name to the error class _LEGACY_ERROR_TEMP_2016
Max Gekk created SPARK-43252: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2016 Key: SPARK-43252 URL: https://issues.apache.org/jira/browse/SPARK-43252 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015
Max Gekk created SPARK-43251: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2015 Key: SPARK-43251 URL: https://issues.apache.org/jira/browse/SPARK-43251 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43252) Assign a name to the error class _LEGACY_ERROR_TEMP_2016
[ https://issues.apache.org/jira/browse/SPARK-43252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43252: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2016 > > > Key: SPARK-43252 > URL: https://issues.apache.org/jira/browse/SPARK-43252 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015
[ https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43251: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2015 > > > Key: SPARK-43251 > URL: https://issues.apache.org/jira/browse/SPARK-43251 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43254) Assign a name to the error class _LEGACY_ERROR_TEMP_2018
[ https://issues.apache.org/jira/browse/SPARK-43254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43254: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2018 > > > Key: SPARK-43254 > URL: https://issues.apache.org/jira/browse/SPARK-43254 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43254) Assign a name to the error class _LEGACY_ERROR_TEMP_2018
Max Gekk created SPARK-43254: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2018 Key: SPARK-43254 URL: https://issues.apache.org/jira/browse/SPARK-43254 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43253) Assign a name to the error class _LEGACY_ERROR_TEMP_2017
Max Gekk created SPARK-43253: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2017 Key: SPARK-43253 URL: https://issues.apache.org/jira/browse/SPARK-43253 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43253) Assign a name to the error class _LEGACY_ERROR_TEMP_2017
[ https://issues.apache.org/jira/browse/SPARK-43253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43253: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2017 > > > Key: SPARK-43253 > URL: https://issues.apache.org/jira/browse/SPARK-43253 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43256) Assign a name to the error class _LEGACY_ERROR_TEMP_2021
Max Gekk created SPARK-43256: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2021 Key: SPARK-43256 URL: https://issues.apache.org/jira/browse/SPARK-43256 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
Max Gekk created SPARK-43255: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2020 Key: SPARK-43255 URL: https://issues.apache.org/jira/browse/SPARK-43255 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
[ https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43255: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2020 > > > Key: SPARK-43255 > URL: https://issues.apache.org/jira/browse/SPARK-43255 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43256) Assign a name to the error class _LEGACY_ERROR_TEMP_2021
[ https://issues.apache.org/jira/browse/SPARK-43256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43256: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2021 > > > Key: SPARK-43256 > URL: https://issues.apache.org/jira/browse/SPARK-43256 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43258) Assign a name to the error class _LEGACY_ERROR_TEMP_2023
Max Gekk created SPARK-43258: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2023 Key: SPARK-43258 URL: https://issues.apache.org/jira/browse/SPARK-43258 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43257) Assign a name to the error class _LEGACY_ERROR_TEMP_2022
Max Gekk created SPARK-43257: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2022 Key: SPARK-43257 URL: https://issues.apache.org/jira/browse/SPARK-43257 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43257) Assign a name to the error class _LEGACY_ERROR_TEMP_2022
[ https://issues.apache.org/jira/browse/SPARK-43257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43257: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2022 > > > Key: SPARK-43257 > URL: https://issues.apache.org/jira/browse/SPARK-43257 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43258) Assign a name to the error class _LEGACY_ERROR_TEMP_2023
[ https://issues.apache.org/jira/browse/SPARK-43258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43258: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2023 > > > Key: SPARK-43258 > URL: https://issues.apache.org/jira/browse/SPARK-43258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
Max Gekk created SPARK-43259: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2024 Key: SPARK-43259 URL: https://issues.apache.org/jira/browse/SPARK-43259 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43259: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43178) Migrate UDF errors into error class
[ https://issues.apache.org/jira/browse/SPARK-43178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43178: - Assignee: Haejoon Lee > Migrate UDF errors into error class > --- > > Key: SPARK-43178 > URL: https://issues.apache.org/jira/browse/SPARK-43178 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Migrate pyspark/sql/udf.py errors into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43178) Migrate UDF errors into error class
[ https://issues.apache.org/jira/browse/SPARK-43178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43178. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40866 [https://github.com/apache/spark/pull/40866] > Migrate UDF errors into error class > --- > > Key: SPARK-43178 > URL: https://issues.apache.org/jira/browse/SPARK-43178 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Migrate pyspark/sql/udf.py errors into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43214) Post driver-side metrics for LocalTableScanExec/CommandResultExec
[ https://issues.apache.org/jira/browse/SPARK-43214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43214. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40875 [https://github.com/apache/spark/pull/40875] > Post driver-side metrics for LocalTableScanExec/CommandResultExec > - > > Key: SPARK-43214 > URL: https://issues.apache.org/jira/browse/SPARK-43214 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Fu Chen >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43214) Post driver-side metrics for LocalTableScanExec/CommandResultExec
[ https://issues.apache.org/jira/browse/SPARK-43214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43214: --- Assignee: Fu Chen > Post driver-side metrics for LocalTableScanExec/CommandResultExec > - > > Key: SPARK-43214 > URL: https://issues.apache.org/jira/browse/SPARK-43214 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Fu Chen >Assignee: Fu Chen >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43260) Migrate the Spark SQL pandas arrow type errors into error class.
Haejoon Lee created SPARK-43260: --- Summary: Migrate the Spark SQL pandas arrow type errors into error class. Key: SPARK-43260 URL: https://issues.apache.org/jira/browse/SPARK-43260 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Haejoon Lee from pyspark/sql/pandas/types.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43246) Make `CheckConnectJvmClientCompatibility` filter the check on `private[packageName]` scope member as default
[ https://issues.apache.org/jira/browse/SPARK-43246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715748#comment-17715748 ] Nikita Awasthi commented on SPARK-43246: User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40925 > Make `CheckConnectJvmClientCompatibility` filter the check on > `private[packageName]` scope member as default > - > > Key: SPARK-43246 > URL: https://issues.apache.org/jira/browse/SPARK-43246 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43246) Make `CheckConnectJvmClientCompatibility` filter the check on `private[packageName]` scope member as default
[ https://issues.apache.org/jira/browse/SPARK-43246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715746#comment-17715746 ] Nikita Awasthi commented on SPARK-43246: User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40925 > Make `CheckConnectJvmClientCompatibility` filter the check on > `private[packageName]` scope member as default > - > > Key: SPARK-43246 > URL: https://issues.apache.org/jira/browse/SPARK-43246 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43261) Migrate `TypeError` from Spark SQL types into error class
Haejoon Lee created SPARK-43261: --- Summary: Migrate `TypeError` from Spark SQL types into error class Key: SPARK-43261 URL: https://issues.apache.org/jira/browse/SPARK-43261 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Haejoon Lee from pyspark/sql/types.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43262) Migrate Spark Connect Structured Streaming errors into error class
Haejoon Lee created SPARK-43262: --- Summary: Migrate Spark Connect Structured Streaming errors into error class Key: SPARK-43262 URL: https://issues.apache.org/jira/browse/SPARK-43262 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Structured Streaming Affects Versions: 3.5.0 Reporter: Haejoon Lee from pyspark/sql/connect/streaming -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43201) Inconsistency between from_avro and from_json function
[ https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715796#comment-17715796 ] karthik kadiyam commented on SPARK-43201: - +1 Will appreciate if someone can take a look at this request from [~pkadetiloye] > Inconsistency between from_avro and from_json function > -- > > Key: SPARK-43201 > URL: https://issues.apache.org/jira/browse/SPARK-43201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Philip Adetiloye >Priority: Major > > Spark from_avro function does not allow schema parameter to use dataframe > column but takes only a String schema: > {code:java} > def from_avro(col: Column, jsonFormatSchema: String): Column {code} > This makes it impossible to deserialize rows of Avro records with different > schema since only one schema string could be pass externally. > > Here is what I would expect like from_json function: > {code:java} > def from_avro(col: Column, jsonFormatSchema: Column): Column {code} > code example: > {code:java} > import org.apache.spark.sql.functions.from_avro > val avroSchema1 = > """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" > > val avroSchema2 = > """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" > val df = Seq( > (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), > (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) > ).toDF("binaryData", "schema") > val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) > parsed.show() > // Output: > // ++ > // | parsedData| > // ++ > // |[apple1, 1.0]| > // |[apple2, 2.0]| > // ++ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43263) Upgrade FasterXML / jackson-dataformats-text to 2.15.0
Bjørn Jørgensen created SPARK-43263: --- Summary: Upgrade FasterXML / jackson-dataformats-text to 2.15.0 Key: SPARK-43263 URL: https://issues.apache.org/jira/browse/SPARK-43263 Project: Spark Issue Type: Dependency upgrade Components: Build Affects Versions: 3.5.0 Reporter: Bjørn Jørgensen * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471]) (contributed by @pjfannin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43264) Avoid allocation of unwritten ColumnVector in VectorizedReader
Zamil Majdy created SPARK-43264: --- Summary: Avoid allocation of unwritten ColumnVector in VectorizedReader Key: SPARK-43264 URL: https://issues.apache.org/jira/browse/SPARK-43264 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.4.1, 3.5.0 Reporter: Zamil Majdy Spark Vectorized Reader allocates the array for every fields for each value count even the array is ended up empty. This causes a high memory consumption when reading a table with large struct+array or many columns with sparse value. One way to fix this is by lazily allocating the column vector and only allocates the array only when it is needed (array is written). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43265) Move Error framework to a common utils module
Rui Wang created SPARK-43265: Summary: Move Error framework to a common utils module Key: SPARK-43265 URL: https://issues.apache.org/jira/browse/SPARK-43265 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43266) Move MergeScalarSubqueries to spark-sql
Peter Toth created SPARK-43266: -- Summary: Move MergeScalarSubqueries to spark-sql Key: SPARK-43266 URL: https://issues.apache.org/jira/browse/SPARK-43266 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.5.0 Reporter: Peter Toth This is a step to make SPARK-40193 easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43217: --- Assignee: Johan Lasperas > Correctly recurse into maps of maps and arrays of arrays in > StructType.findNestedField > -- > > Key: SPARK-43217 > URL: https://issues.apache.org/jira/browse/SPARK-43217 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Minor > > [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] > is unable to reach nested fields below two directly nested maps or arrays. > Whenever it reaches a map or an array, it'll throw an `invalidFieldName` > exception if the child is not a struct. > The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: > `a`.`element`.`element` is not a struct.'{}}}, even though the access path is > valid: > {code:java} > val schema = new StructType() > .add("a", ArrayType(ArrayType( > new StructType().add("i", "int" > findNestedField(Seq("a", "element", "element", "i"), schema) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43217. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40879 [https://github.com/apache/spark/pull/40879] > Correctly recurse into maps of maps and arrays of arrays in > StructType.findNestedField > -- > > Key: SPARK-43217 > URL: https://issues.apache.org/jira/browse/SPARK-43217 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Minor > Fix For: 3.5.0 > > > [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] > is unable to reach nested fields below two directly nested maps or arrays. > Whenever it reaches a map or an array, it'll throw an `invalidFieldName` > exception if the child is not a struct. > The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: > `a`.`element`.`element` is not a struct.'{}}}, even though the access path is > valid: > {code:java} > val schema = new StructType() > .add("a", ArrayType(ArrayType( > new StructType().add("i", "int" > findNestedField(Seq("a", "element", "element", "i"), schema) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43267) Support creating data frame from a Postgres table that contains user-defined array column
Sifan Huang created SPARK-43267: --- Summary: Support creating data frame from a Postgres table that contains user-defined array column Key: SPARK-43267 URL: https://issues.apache.org/jira/browse/SPARK-43267 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.2, 2.4.0 Reporter: Sifan Huang Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column. However, it used to allow such type before the Postgres JDBC commit (https://github.com/pgjdbc/pgjdbc/commit/375cb3795c3330f9434cee9353f0791b86125914). The previous behavior was to handle user-defined array column as String. Given: * Postgres table with user-defined array column * Function: DataFrameReader.jdbc - https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/DataFrameReader.html#jdbc-java.lang.String-java.lang.String-java.util.Properties- Results: * Exception “java.sql.SQLException: Unsupported type ARRAY” is thrown Expectation after the change: * Function call succeeds * User-defined array is converted as a string in Spark DataFrame Suggested fix: * Update “getCatalystType” function in “PostgresDialect” as ** {code:java} val catalystType = toCatalystType(typeName.drop(1), size, scale).map(ArrayType(_)) if (catalystType.isEmpty) Some(StringType) else catalystType{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43268) Use proper error classes when exceptions are constructed with a message
Anton Okolnychyi created SPARK-43268: Summary: Use proper error classes when exceptions are constructed with a message Key: SPARK-43268 URL: https://issues.apache.org/jira/browse/SPARK-43268 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Anton Okolnychyi As discussed [here|https://github.com/apache/spark/pull/40679/files#r1159264585]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
Neil Jonkers created SPARK-43269: Summary: Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true Key: SPARK-43269 URL: https://issues.apache.org/jira/browse/SPARK-43269 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Neil Jonkers Hello, With `spark.sql.files.ignoreMissingFiles=true` we notice [readParquetFootersInParallel|[https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]] can still encounter `FileNotFoundException`. I notice function `readParquetFootersInParallel` can handle scenario where `{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}} Would it be feasible to support the scenario where `spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` is called? Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, With`spark.sql.files.ignoreMissingFiles=true` we notice [readParquetFootersInParallel|#L438]] can still encounter `FileNotFoundException`. I notice function `readParquetFootersInParallel` can handle scenario where `{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}} Would it be feasible to support the scenario where `spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` is called? Thank you was: Hello, With `spark.sql.files.ignoreMissingFiles=true` we notice [readParquetFootersInParallel|[https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]] can still encounter `FileNotFoundException`. I notice function `readParquetFootersInParallel` can handle scenario where `{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}} Would it be feasible to support the scenario where `spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` is called? Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > With`spark.sql.files.ignoreMissingFiles=true` we notice > [readParquetFootersInParallel|#L438]] can still encounter > `FileNotFoundException`. > I notice function `readParquetFootersInParallel` can handle scenario where > `{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}} > > Would it be feasible to support the scenario where > `spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` > is called? > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [readParquetFootersInParallel |#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With }}*spark.sql.files.ignoreMissingFiles=true*{{ we notice [readParquetFootersInParallel |#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With *spark.sql.files.ignoreMissingFiles=true* we notice > [readParquetFootersInParallel |#L438]can still encounter > {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With }}*spark.sql.files.ignoreMissingFiles=true*{{ we notice [readParquetFootersInParallel |#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, With`spark.sql.files.ignoreMissingFiles=true` we notice [readParquetFootersInParallel|#L438]] can still encounter `FileNotFoundException`. I notice function `readParquetFootersInParallel` can handle scenario where `{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}} Would it be feasible to support the scenario where `spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` is called? Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With }}*spark.sql.files.ignoreMissingFiles=true*{{ we notice > [readParquetFootersInParallel |#L438]can still encounter > {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [link title|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [readParquetFootersInParallel |#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With *spark.sql.files.ignoreMissingFiles=true* we notice [link > title|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can > still encounter {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [link title|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With *spark.sql.files.ignoreMissingFiles=true* we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can > still encounter {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With `spark.sql.files.ignoreMissingFiles=true` we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With `spark.sql.files.ignoreMissingFiles=true` we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] > can still encounter {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With *spark.sql.files.ignoreMissingFiles=true* we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With *spark.sql.files.ignoreMissingFiles=true* we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] > can still encounter {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With spark.sql.files.ignoreMissingFiles=true we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With {{spark.sql.files.ignoreMissingFiles=true}} we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With spark.sql.files.ignoreMissingFiles=true we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] > can still encounter {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With {{spark.sql.files.ignoreMissingFiles=true}} we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you was: Hello, {{With `spark.sql.files.ignoreMissingFiles=true` we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With {{spark.sql.files.ignoreMissingFiles=true}} we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] > can still encounter {*}FileNotFoundException{*}. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{*FileNotFoundException.*}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Description: Hello, {{With spark.sql.files.ignoreMissingFiles=true we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter FileNotFoundException. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{FileNotFoundException{*}.{*}}} Thank you was: Hello, {{With spark.sql.files.ignoreMissingFiles=true we notice [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] can still encounter {*}FileNotFoundException{*}. }} I notice function readParquetFootersInParallel handle the scenario where {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} My question: Would it be feasible to support the scenario where *spark.sql.files.ignoreMissingFiles=true* in the function: readParquetFootersInParallel as well ? To prevent application failure due to {{*FileNotFoundException.*}} Thank you > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With spark.sql.files.ignoreMissingFiles=true we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] > can still encounter FileNotFoundException. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{FileNotFoundException{*}.{*}}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43263) Upgrade FasterXML jackson to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-43263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-43263: Summary: Upgrade FasterXML jackson to 2.15.0 (was: Upgrade FasterXML / jackson-dataformats-text to 2.15.0) > Upgrade FasterXML jackson to 2.15.0 > --- > > Key: SPARK-43263 > URL: https://issues.apache.org/jira/browse/SPARK-43263 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves > [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471]) > (contributed by @pjfannin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43263) Upgrade FasterXML jackson to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-43263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715949#comment-17715949 ] PJ Fanning commented on SPARK-43263: This is a duplicate of SPARK-42854 and it is not a good idea to disregard the points made in SPARK-42854 > Upgrade FasterXML jackson to 2.15.0 > --- > > Key: SPARK-43263 > URL: https://issues.apache.org/jira/browse/SPARK-43263 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves > [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471]) > (contributed by @pjfannin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
[ https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Jonkers updated SPARK-43269: - Issue Type: Improvement (was: Bug) > Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true > --- > > Key: SPARK-43269 > URL: https://issues.apache.org/jira/browse/SPARK-43269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Neil Jonkers >Priority: Minor > > Hello, > {{With spark.sql.files.ignoreMissingFiles=true we notice > [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438] > can still encounter FileNotFoundException. }} > I notice function readParquetFootersInParallel handle the scenario where > {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}} > My question: Would it be feasible to support the scenario where > *spark.sql.files.ignoreMissingFiles=true* in the function: > readParquetFootersInParallel as well ? To prevent application failure due to > {{FileNotFoundException{*}.{*}}} > > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
Beishao Cao created SPARK-43270: --- Summary: Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns Key: SPARK-43270 URL: https://issues.apache.org/jira/browse/SPARK-43270 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Beishao Cao Currently, {{df.|}} will only suggest the method of dataframe(see attached photo of databricks notebook), but {{df.column_name}} is also legal. !image-2023-04-24-13-44-33-716.png|width=389,height=248! So we should override the parent {{__dir__}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Attachment: Screenshot 2023-04-23 at 6.48.46 PM-1.png > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, {{df.|}} will only suggest the method of dataframe(see attached > photo of databricks notebook), but {{df.column_name}} is also legal. > !image-2023-04-24-13-44-33-716.png|width=389,height=248! > So we should override the parent {{__dir__}} method on Python {{DataFrame}} > class to include column names. And the benefit of this is engine that uses > {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, > Databricks Notebooks) will suggest column names on the completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Attachment: (was: Screenshot 2023-04-23 at 6.48.46 PM-1.png) > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, {{df.|}} will only suggest the method of dataframe(see attached > photo of databricks notebook), but {{df.column_name}} is also legal. > > > So we should override the parent {{_{_}dir{_}_}} method on Python > {{DataFrame}} class to include column names. And the benefit of this is > engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython > kernel, Databricks Notebooks) will suggest column names on the completion > {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, {{df.|}} will only suggest the method of dataframe(see attached Screenshot of databricks notebook), but {{df.column_name}} is also legal. Hence we should override the parent {{{_}{{_}}dir{{_}}{_}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, {{df.|}} will only suggest the method of dataframe(see attached photo of databricks notebook), but {{df.column_name}} is also legal. So we should override the parent {{_{_}dir{_}_}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, {{df.|}} will only suggest the method of dataframe(see attached > Screenshot of databricks notebook), but {{df.column_name}} is also legal. > Hence we should override the parent {{{_}{{_}}dir{{_}}{_}}} method on Python > {{DataFrame}} class to include column names. And the benefit of this is > engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython > kernel, Databricks Notebooks) will suggest column names on the completion > {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, {{df.|}} will only suggest the method of dataframe(see attached photo of databricks notebook), but {{df.column_name}} is also legal. So we should override the parent {{_{_}dir{_}_}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, {{df.|}} will only suggest the method of dataframe(see attached photo of databricks notebook), but {{df.column_name}} is also legal. !image-2023-04-24-13-44-33-716.png|width=389,height=248! So we should override the parent {{__dir__}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, {{df.|}} will only suggest the method of dataframe(see attached > photo of databricks notebook), but {{df.column_name}} is also legal. > > > So we should override the parent {{_{_}dir{_}_}} method on Python > {{DataFrame}} class to include column names. And the benefit of this is > engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython > kernel, Databricks Notebooks) will suggest column names on the completion > {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Attachment: Screenshot 2023-04-23 at 6.48.46 PM.png > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, {{df.|}} will only suggest the method of dataframe(see attached > photo of databricks notebook), but {{df.column_name}} is also legal. > !image-2023-04-24-13-44-33-716.png|width=389,height=248! > So we should override the parent {{__dir__}} method on Python {{DataFrame}} > class to include column names. And the benefit of this is engine that uses > {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, > Databricks Notebooks) will suggest column names on the completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, {{df.|}} will only suggest the method of dataframe(see attached Screenshot of databricks notebook), but {{df.column_name}} is also legal. Hence we should override the parent __{{{}dir__{}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, {{df.|}} will only suggest the method of dataframe(see attached Screenshot of databricks notebook), but {{df.column_name}} is also legal. Hence we should override the parent {{{_}{{_}}dir{{_}}{_}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, {{df.|}} will only suggest the method of dataframe(see attached > Screenshot of databricks notebook), but {{df.column_name}} is also legal. > Hence we should override the parent __{{{}dir__{}}} method on Python > {{DataFrame}} class to include column names. And the benefit of this is > engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython > kernel, Databricks Notebooks) will suggest column names on the completion > {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent _{_}{{dir}}{_}{{{}_{}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, {{df.|}} will only suggest the method of dataframe(see attached Screenshot of databricks notebook), but {{df.column_name}} is also legal. Hence we should override the parent __{{{}dir__{}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Given {{df.|}} , the databricks notebook will only suggest the > method of dataframe(see attached Screenshot of databricks notebook), > {{However, df.column_name}} is also legal and runnable > Hence we should override the parent _{_}{{dir}}{_}{{{}_{}}} method on Python > {{DataFrame}} class to include column names. And the benefit of this is > engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython > kernel, Databricks Notebooks) will suggest column names on the completion > {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent _{_}{{dir}}{_}{{{}_(){}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent __{{{}dir__(){}}}{{{}{}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Given {{df.|}} , the databricks notebook will only suggest the > method of dataframe(see attached Screenshot of databricks notebook), > {{However, df.column_name}} is also legal and runnable > Hence we should override the parent _{_}{{dir}}{_}{{{}_(){}}} method on > Python {{DataFrame}} class to include column names. And the benefit of this > is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. > IPython kernel, Databricks Notebooks) will suggest column names on the > completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent {{{}dir{}}}{{{}{}}}{{{}(){}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent _{_}{{dir}}{_}{{{}_(){}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Given {{df.|}} , the databricks notebook will only suggest the > method of dataframe(see attached Screenshot of databricks notebook), > {{However, df.column_name}} is also legal and runnable > Hence we should override the parent {{{}dir{}}}{{{}{}}}{{{}(){}}} method on > Python {{DataFrame}} class to include column names. And the benefit of this > is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. > IPython kernel, Databricks Notebooks) will suggest column names on the > completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent __{{{}dir__(){}}}{{{}{}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent _{_}{{dir}}{_}{{{}_{}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Given {{df.|}} , the databricks notebook will only suggest the > method of dataframe(see attached Screenshot of databricks notebook), > {{However, df.column_name}} is also legal and runnable > Hence we should override the parent __{{{}dir__(){}}}{{{}{}}} method on > Python {{DataFrame}} class to include column names. And the benefit of this > is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. > IPython kernel, Databricks Notebooks) will suggest column names on the > completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beishao Cao updated SPARK-43270: Description: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent __{{{}dir__{}}}{{{}(){}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} was: Currently, Given {{df.|}} , the databricks notebook will only suggest the method of dataframe(see attached Screenshot of databricks notebook), {{However, df.column_name}} is also legal and runnable Hence we should override the parent {{{}dir{}}}{{{}{}}}{{{}(){}}} method on Python {{DataFrame}} class to include column names. And the benefit of this is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks Notebooks) will suggest column names on the completion {{df.|}} > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Given {{df.|}} , the databricks notebook will only suggest the > method of dataframe(see attached Screenshot of databricks notebook), > {{However, df.column_name}} is also legal and runnable > Hence we should override the parent __{{{}dir__{}}}{{{}(){}}} method on > Python {{DataFrame}} class to include column names. And the benefit of this > is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. > IPython kernel, Databricks Notebooks) will suggest column names on the > completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014
[ https://issues.apache.org/jira/browse/SPARK-43250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716006#comment-17716006 ] Atour Mousavi Gourabi commented on SPARK-43250: --- I'd like to take this one if you guys don't mind. Seems like a nice way to get to know the codebase. > Assign a name to the error class _LEGACY_ERROR_TEMP_2014 > > > Key: SPARK-43250 > URL: https://issues.apache.org/jira/browse/SPARK-43250 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43268) Use proper error classes when exceptions are constructed with a message
[ https://issues.apache.org/jira/browse/SPARK-43268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43268. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40934 [https://github.com/apache/spark/pull/40934] > Use proper error classes when exceptions are constructed with a message > --- > > Key: SPARK-43268 > URL: https://issues.apache.org/jira/browse/SPARK-43268 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.5.0 > > > As discussed > [here|https://github.com/apache/spark/pull/40679/files#r1159264585]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43268) Use proper error classes when exceptions are constructed with a message
[ https://issues.apache.org/jira/browse/SPARK-43268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43268: Assignee: Anton Okolnychyi > Use proper error classes when exceptions are constructed with a message > --- > > Key: SPARK-43268 > URL: https://issues.apache.org/jira/browse/SPARK-43268 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > As discussed > [here|https://github.com/apache/spark/pull/40679/files#r1159264585]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43233) Before batch reading from Kafka, log topic partition, offset range, etc, for debuggin
[ https://issues.apache.org/jira/browse/SPARK-43233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-43233. -- Fix Version/s: 3.5.0 Assignee: Siying Dong Resolution: Fixed Issue resolved via https://github.com/apache/spark/pull/40905 > Before batch reading from Kafka, log topic partition, offset range, etc, for > debuggin > - > > Key: SPARK-43233 > URL: https://issues.apache.org/jira/browse/SPARK-43233 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Trivial > Fix For: 3.5.0 > > > When debugging some slowness issue in structured streaming, it is hard to map > a Kafka topic and partition to a Kafka task. Adding some logging in executor > might help make it easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43262) Migrate Spark Connect Structured Streaming errors into error class
[ https://issues.apache.org/jira/browse/SPARK-43262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43262. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40928 [https://github.com/apache/spark/pull/40928] > Migrate Spark Connect Structured Streaming errors into error class > -- > > Key: SPARK-43262 > URL: https://issues.apache.org/jira/browse/SPARK-43262 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > from pyspark/sql/connect/streaming -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43262) Migrate Spark Connect Structured Streaming errors into error class
[ https://issues.apache.org/jira/browse/SPARK-43262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43262: - Assignee: Haejoon Lee > Migrate Spark Connect Structured Streaming errors into error class > -- > > Key: SPARK-43262 > URL: https://issues.apache.org/jira/browse/SPARK-43262 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > from pyspark/sql/connect/streaming -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43271) Fix test DataFrameTests.test_reindex with specifying `index`.
Haejoon Lee created SPARK-43271: --- Summary: Fix test DataFrameTests.test_reindex with specifying `index`. Key: SPARK-43271 URL: https://issues.apache.org/jira/browse/SPARK-43271 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee Re-enable pandas 2.0.0 test in DataFrameTests.test_reindex in proper way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43271) Match behavior with DataFrame.reindex with specifying `index`.
[ https://issues.apache.org/jira/browse/SPARK-43271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43271: Summary: Match behavior with DataFrame.reindex with specifying `index`. (was: Fix test DataFrameTests.test_reindex with specifying `index`.) > Match behavior with DataFrame.reindex with specifying `index`. > -- > > Key: SPARK-43271 > URL: https://issues.apache.org/jira/browse/SPARK-43271 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Re-enable pandas 2.0.0 test in DataFrameTests.test_reindex in proper way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42419) Migrate `TypeError` into error framework for Spark Connect column API.
[ https://issues.apache.org/jira/browse/SPARK-42419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716071#comment-17716071 ] Snoot.io commented on SPARK-42419: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40927 > Migrate `TypeError` into error framework for Spark Connect column API. > -- > > Key: SPARK-42419 > URL: https://issues.apache.org/jira/browse/SPARK-42419 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > We should migrate all errors into PySpark error framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
[ https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716072#comment-17716072 ] Snoot.io commented on SPARK-43270: -- User 'alexanderwu-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40907 > Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns > - > > Key: SPARK-43270 > URL: https://issues.apache.org/jira/browse/SPARK-43270 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Beishao Cao >Priority: Major > Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Given {{df.|}} , the databricks notebook will only suggest the > method of dataframe(see attached Screenshot of databricks notebook), > {{However, df.column_name}} is also legal and runnable > Hence we should override the parent __{{{}dir__{}}}{{{}(){}}} method on > Python {{DataFrame}} class to include column names. And the benefit of this > is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. > IPython kernel, Databricks Notebooks) will suggest column names on the > completion {{df.|}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43144) Scala: DataStreamReader table() API
[ https://issues.apache.org/jira/browse/SPARK-43144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43144. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40887 [https://github.com/apache/spark/pull/40887] > Scala: DataStreamReader table() API > --- > > Key: SPARK-43144 > URL: https://issues.apache.org/jira/browse/SPARK-43144 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43260) Migrate the Spark SQL pandas arrow type errors into error class.
[ https://issues.apache.org/jira/browse/SPARK-43260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43260: - Assignee: Haejoon Lee > Migrate the Spark SQL pandas arrow type errors into error class. > > > Key: SPARK-43260 > URL: https://issues.apache.org/jira/browse/SPARK-43260 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > from pyspark/sql/pandas/types.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43260) Migrate the Spark SQL pandas arrow type errors into error class.
[ https://issues.apache.org/jira/browse/SPARK-43260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43260. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40924 [https://github.com/apache/spark/pull/40924] > Migrate the Spark SQL pandas arrow type errors into error class. > > > Key: SPARK-43260 > URL: https://issues.apache.org/jira/browse/SPARK-43260 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > from pyspark/sql/pandas/types.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43272) Replace reflection w/ direct calling for `SparkHadoopUtil#createFile`
Yang Jie created SPARK-43272: Summary: Replace reflection w/ direct calling for `SparkHadoopUtil#createFile` Key: SPARK-43272 URL: https://issues.apache.org/jira/browse/SPARK-43272 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014
[ https://issues.apache.org/jira/browse/SPARK-43250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716118#comment-17716118 ] Max Gekk commented on SPARK-43250: -- [~amousavigourabi] Sure, go ahead. > Assign a name to the error class _LEGACY_ERROR_TEMP_2014 > > > Key: SPARK-43250 > URL: https://issues.apache.org/jira/browse/SPARK-43250 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org