[jira] [Created] (SPARK-43248) Unnecessary serialize/deserialize of Path on parallel gather partition stats

2023-04-24 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-43248:
-

 Summary: Unnecessary serialize/deserialize of Path on parallel 
gather partition stats
 Key: SPARK-43248
 URL: https://issues.apache.org/jira/browse/SPARK-43248
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43229) Introduce Barrier Python UDF

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43229:
--
Summary: Introduce Barrier Python UDF  (was: Support Barrier Python UDF)

> Introduce Barrier Python UDF
> 
>
> Key: SPARK-43229
> URL: https://issues.apache.org/jira/browse/SPARK-43229
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43249) df.sql() should send metrics back()

2023-04-24 Thread Martin Grund (Jira)
Martin Grund created SPARK-43249:


 Summary: df.sql() should send metrics back()
 Key: SPARK-43249
 URL: https://issues.apache.org/jira/browse/SPARK-43249
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund


df.sql() does not return the metrics to the client when executed as a command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42317) Assign name to _LEGACY_ERROR_TEMP_2247

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42317.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40810
[https://github.com/apache/spark/pull/40810]

> Assign name to _LEGACY_ERROR_TEMP_2247
> --
>
> Key: SPARK-42317
> URL: https://issues.apache.org/jira/browse/SPARK-42317
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Koray Beyaz
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42317) Assign name to _LEGACY_ERROR_TEMP_2247

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42317:


Assignee: Koray Beyaz

> Assign name to _LEGACY_ERROR_TEMP_2247
> --
>
> Key: SPARK-42317
> URL: https://issues.apache.org/jira/browse/SPARK-42317
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Koray Beyaz
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43249) df.sql() should send metrics back()

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43249:
-

Assignee: Martin Grund

> df.sql() should send metrics back()
> ---
>
> Key: SPARK-43249
> URL: https://issues.apache.org/jira/browse/SPARK-43249
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>
> df.sql() does not return the metrics to the client when executed as a command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43249) df.sql() should send metrics back()

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43249.
---
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 40899
[https://github.com/apache/spark/pull/40899]

> df.sql() should send metrics back()
> ---
>
> Key: SPARK-43249
> URL: https://issues.apache.org/jira/browse/SPARK-43249
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
>
> df.sql() does not return the metrics to the client when executed as a command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43250:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2014
> 
>
> Key: SPARK-43250
> URL: https://issues.apache.org/jira/browse/SPARK-43250
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43250:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2014
 Key: SPARK-43250
 URL: https://issues.apache.org/jira/browse/SPARK-43250
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43252) Assign a name to the error class _LEGACY_ERROR_TEMP_2016

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43252:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2016
 Key: SPARK-43252
 URL: https://issues.apache.org/jira/browse/SPARK-43252
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43251:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2015
 Key: SPARK-43251
 URL: https://issues.apache.org/jira/browse/SPARK-43251
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43252) Assign a name to the error class _LEGACY_ERROR_TEMP_2016

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43252:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2016
> 
>
> Key: SPARK-43252
> URL: https://issues.apache.org/jira/browse/SPARK-43252
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43251:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2015
> 
>
> Key: SPARK-43251
> URL: https://issues.apache.org/jira/browse/SPARK-43251
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43254) Assign a name to the error class _LEGACY_ERROR_TEMP_2018

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43254:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2018
> 
>
> Key: SPARK-43254
> URL: https://issues.apache.org/jira/browse/SPARK-43254
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43254) Assign a name to the error class _LEGACY_ERROR_TEMP_2018

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43254:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2018
 Key: SPARK-43254
 URL: https://issues.apache.org/jira/browse/SPARK-43254
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43253) Assign a name to the error class _LEGACY_ERROR_TEMP_2017

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43253:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2017
 Key: SPARK-43253
 URL: https://issues.apache.org/jira/browse/SPARK-43253
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43253) Assign a name to the error class _LEGACY_ERROR_TEMP_2017

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43253:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2017
> 
>
> Key: SPARK-43253
> URL: https://issues.apache.org/jira/browse/SPARK-43253
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2017* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43256) Assign a name to the error class _LEGACY_ERROR_TEMP_2021

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43256:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2021
 Key: SPARK-43256
 URL: https://issues.apache.org/jira/browse/SPARK-43256
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43255:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2020
 Key: SPARK-43255
 URL: https://issues.apache.org/jira/browse/SPARK-43255
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43255:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2018* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43256) Assign a name to the error class _LEGACY_ERROR_TEMP_2021

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43256:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2021
> 
>
> Key: SPARK-43256
> URL: https://issues.apache.org/jira/browse/SPARK-43256
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43258) Assign a name to the error class _LEGACY_ERROR_TEMP_2023

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43258:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2023
 Key: SPARK-43258
 URL: https://issues.apache.org/jira/browse/SPARK-43258
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43257) Assign a name to the error class _LEGACY_ERROR_TEMP_2022

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43257:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2022
 Key: SPARK-43257
 URL: https://issues.apache.org/jira/browse/SPARK-43257
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43257) Assign a name to the error class _LEGACY_ERROR_TEMP_2022

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43257:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2021* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2022
> 
>
> Key: SPARK-43257
> URL: https://issues.apache.org/jira/browse/SPARK-43257
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43258) Assign a name to the error class _LEGACY_ERROR_TEMP_2023

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43258:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2022* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2023
> 
>
> Key: SPARK-43258
> URL: https://issues.apache.org/jira/browse/SPARK-43258
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2023-04-24 Thread Max Gekk (Jira)
Max Gekk created SPARK-43259:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2024
 Key: SPARK-43259
 URL: https://issues.apache.org/jira/browse/SPARK-43259
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2023-04-24 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43259:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2023* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43178) Migrate UDF errors into error class

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43178:
-

Assignee: Haejoon Lee

> Migrate UDF errors into error class
> ---
>
> Key: SPARK-43178
> URL: https://issues.apache.org/jira/browse/SPARK-43178
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Migrate pyspark/sql/udf.py errors into error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43178) Migrate UDF errors into error class

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43178.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40866
[https://github.com/apache/spark/pull/40866]

> Migrate UDF errors into error class
> ---
>
> Key: SPARK-43178
> URL: https://issues.apache.org/jira/browse/SPARK-43178
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Migrate pyspark/sql/udf.py errors into error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43214) Post driver-side metrics for LocalTableScanExec/CommandResultExec

2023-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43214.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40875
[https://github.com/apache/spark/pull/40875]

> Post driver-side metrics for LocalTableScanExec/CommandResultExec
> -
>
> Key: SPARK-43214
> URL: https://issues.apache.org/jira/browse/SPARK-43214
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Fu Chen
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43214) Post driver-side metrics for LocalTableScanExec/CommandResultExec

2023-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43214:
---

Assignee: Fu Chen

> Post driver-side metrics for LocalTableScanExec/CommandResultExec
> -
>
> Key: SPARK-43214
> URL: https://issues.apache.org/jira/browse/SPARK-43214
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Fu Chen
>Assignee: Fu Chen
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43260) Migrate the Spark SQL pandas arrow type errors into error class.

2023-04-24 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43260:
---

 Summary: Migrate the Spark SQL pandas arrow type errors into error 
class.
 Key: SPARK-43260
 URL: https://issues.apache.org/jira/browse/SPARK-43260
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


from pyspark/sql/pandas/types.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43246) Make `CheckConnectJvmClientCompatibility` filter the check on `private[packageName]` scope member as default

2023-04-24 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715748#comment-17715748
 ] 

Nikita Awasthi commented on SPARK-43246:


User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40925

> Make `CheckConnectJvmClientCompatibility`  filter the check on 
> `private[packageName]` scope member as default
> -
>
> Key: SPARK-43246
> URL: https://issues.apache.org/jira/browse/SPARK-43246
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43246) Make `CheckConnectJvmClientCompatibility` filter the check on `private[packageName]` scope member as default

2023-04-24 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715746#comment-17715746
 ] 

Nikita Awasthi commented on SPARK-43246:


User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40925

> Make `CheckConnectJvmClientCompatibility`  filter the check on 
> `private[packageName]` scope member as default
> -
>
> Key: SPARK-43246
> URL: https://issues.apache.org/jira/browse/SPARK-43246
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43261) Migrate `TypeError` from Spark SQL types into error class

2023-04-24 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43261:
---

 Summary: Migrate `TypeError` from Spark SQL types into error class
 Key: SPARK-43261
 URL: https://issues.apache.org/jira/browse/SPARK-43261
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


from pyspark/sql/types.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43262) Migrate Spark Connect Structured Streaming errors into error class

2023-04-24 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43262:
---

 Summary: Migrate Spark Connect Structured Streaming errors into 
error class
 Key: SPARK-43262
 URL: https://issues.apache.org/jira/browse/SPARK-43262
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Structured Streaming
Affects Versions: 3.5.0
Reporter: Haejoon Lee


from pyspark/sql/connect/streaming



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43201) Inconsistency between from_avro and from_json function

2023-04-24 Thread karthik kadiyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715796#comment-17715796
 ] 

karthik kadiyam commented on SPARK-43201:
-

+1 

Will appreciate if someone can take a look at this request from [~pkadetiloye] 

> Inconsistency between from_avro and from_json function
> --
>
> Key: SPARK-43201
> URL: https://issues.apache.org/jira/browse/SPARK-43201
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Philip Adetiloye
>Priority: Major
>
> Spark from_avro function does not allow schema parameter to use dataframe 
> column but takes only a String schema:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: String): Column {code}
> This makes it impossible to deserialize rows of Avro records with different 
> schema since only one schema string could be pass externally. 
>  
> Here is what I would expect like from_json function:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
> code example:
> {code:java}
> import org.apache.spark.sql.functions.from_avro
> val avroSchema1 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
>  
> val avroSchema2 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
> val df = Seq(
>   (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
>   (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
> ).toDF("binaryData", "schema")
> val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))
> parsed.show()
> // Output:
> // ++
> // |  parsedData|
> // ++
> // |[apple1, 1.0]|
> // |[apple2, 2.0]|
> // ++
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43263) Upgrade FasterXML / jackson-dataformats-text to 2.15.0

2023-04-24 Thread Jira
Bjørn Jørgensen created SPARK-43263:
---

 Summary: Upgrade FasterXML / jackson-dataformats-text to 2.15.0
 Key: SPARK-43263
 URL: https://issues.apache.org/jira/browse/SPARK-43263
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 3.5.0
Reporter: Bjørn Jørgensen


* #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves 
[CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471])
 (contributed by @pjfannin




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43264) Avoid allocation of unwritten ColumnVector in VectorizedReader

2023-04-24 Thread Zamil Majdy (Jira)
Zamil Majdy created SPARK-43264:
---

 Summary: Avoid allocation of unwritten ColumnVector in 
VectorizedReader
 Key: SPARK-43264
 URL: https://issues.apache.org/jira/browse/SPARK-43264
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.4.1, 3.5.0
Reporter: Zamil Majdy


Spark Vectorized Reader allocates the array for every fields for each value 
count even the array is ended up empty. This causes a high memory consumption 
when reading a table with large struct+array or many columns with sparse value. 
One way to fix this is by lazily allocating the column vector and only 
allocates the array only when it is needed (array is written).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43265) Move Error framework to a common utils module

2023-04-24 Thread Rui Wang (Jira)
Rui Wang created SPARK-43265:


 Summary: Move Error framework to a common utils module
 Key: SPARK-43265
 URL: https://issues.apache.org/jira/browse/SPARK-43265
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43266) Move MergeScalarSubqueries to spark-sql

2023-04-24 Thread Peter Toth (Jira)
Peter Toth created SPARK-43266:
--

 Summary: Move MergeScalarSubqueries to spark-sql
 Key: SPARK-43266
 URL: https://issues.apache.org/jira/browse/SPARK-43266
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Peter Toth


This is a step to make SPARK-40193 easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField

2023-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43217:
---

Assignee: Johan Lasperas

> Correctly recurse into maps of maps and arrays of arrays in 
> StructType.findNestedField
> --
>
> Key: SPARK-43217
> URL: https://issues.apache.org/jira/browse/SPARK-43217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Minor
>
> [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
>  is unable to reach nested fields below two directly nested maps or arrays. 
> Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
> exception if the child is not a struct.
> The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
> `a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
> valid:
> {code:java}
> val schema = new StructType()
>   .add("a", ArrayType(ArrayType(
>     new StructType().add("i", "int"
> findNestedField(Seq("a", "element", "element", "i"), schema) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField

2023-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43217.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40879
[https://github.com/apache/spark/pull/40879]

> Correctly recurse into maps of maps and arrays of arrays in 
> StructType.findNestedField
> --
>
> Key: SPARK-43217
> URL: https://issues.apache.org/jira/browse/SPARK-43217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Minor
> Fix For: 3.5.0
>
>
> [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325]
>  is unable to reach nested fields below two directly nested maps or arrays. 
> Whenever it reaches a map or an array, it'll throw an `invalidFieldName` 
> exception if the child is not a struct.
> The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: 
> `a`.`element`.`element` is not a struct.'{}}}, even though the access path is 
> valid:
> {code:java}
> val schema = new StructType()
>   .add("a", ArrayType(ArrayType(
>     new StructType().add("i", "int"
> findNestedField(Seq("a", "element", "element", "i"), schema) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43267) Support creating data frame from a Postgres table that contains user-defined array column

2023-04-24 Thread Sifan Huang (Jira)
Sifan Huang created SPARK-43267:
---

 Summary: Support creating data frame from a Postgres table that 
contains user-defined array column
 Key: SPARK-43267
 URL: https://issues.apache.org/jira/browse/SPARK-43267
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.2, 2.4.0
Reporter: Sifan Huang


Spark SQL now doesn’t support creating data frame from a Postgres table that 
contains user-defined array column. However, it used to allow such type before 
the Postgres JDBC commit 
(https://github.com/pgjdbc/pgjdbc/commit/375cb3795c3330f9434cee9353f0791b86125914).
 The previous behavior was to handle user-defined array column as String.

Given:
 * Postgres table with user-defined array column
 * Function: DataFrameReader.jdbc - 
https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/DataFrameReader.html#jdbc-java.lang.String-java.lang.String-java.util.Properties-

Results:
 * Exception “java.sql.SQLException: Unsupported type ARRAY” is thrown

Expectation after the change:
 * Function call succeeds
 * User-defined array is converted as a string in Spark DataFrame

Suggested fix:
 * Update “getCatalystType” function in “PostgresDialect” as
 ** 
{code:java}
val catalystType = toCatalystType(typeName.drop(1), size, 
scale).map(ArrayType(_))
if (catalystType.isEmpty) Some(StringType) else catalystType{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43268) Use proper error classes when exceptions are constructed with a message

2023-04-24 Thread Anton Okolnychyi (Jira)
Anton Okolnychyi created SPARK-43268:


 Summary: Use proper error classes when exceptions are constructed 
with a message
 Key: SPARK-43268
 URL: https://issues.apache.org/jira/browse/SPARK-43268
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Anton Okolnychyi


As discussed 
[here|https://github.com/apache/spark/pull/40679/files#r1159264585].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)
Neil Jonkers created SPARK-43269:


 Summary: Adding support for MissingFiles when 
spark.sql.parquet.mergeSchema=true
 Key: SPARK-43269
 URL: https://issues.apache.org/jira/browse/SPARK-43269
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Neil Jonkers


Hello,

With `spark.sql.files.ignoreMissingFiles=true` we notice 
[readParquetFootersInParallel|[https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]]
 can still encounter `FileNotFoundException`. 

I notice function `readParquetFootersInParallel` can handle scenario where 
`{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}}

 

Would it be feasible to support the scenario where 
`spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` 
is called?

 

Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

With`spark.sql.files.ignoreMissingFiles=true` we notice 
[readParquetFootersInParallel|#L438]] can still encounter 
`FileNotFoundException`. 

I notice function `readParquetFootersInParallel` can handle scenario where 
`{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}}

 

Would it be feasible to support the scenario where 
`spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` 
is called?

 

Thank you

  was:
Hello,

With `spark.sql.files.ignoreMissingFiles=true` we notice 
[readParquetFootersInParallel|[https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]]
 can still encounter `FileNotFoundException`. 

I notice function `readParquetFootersInParallel` can handle scenario where 
`{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}}

 

Would it be feasible to support the scenario where 
`spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` 
is called?

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> With`spark.sql.files.ignoreMissingFiles=true` we notice 
> [readParquetFootersInParallel|#L438]] can still encounter 
> `FileNotFoundException`. 
> I notice function `readParquetFootersInParallel` can handle scenario where 
> `{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}}
>  
> Would it be feasible to support the scenario where 
> `spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` 
> is called?
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice 
[readParquetFootersInParallel |#L438]can still encounter 
{*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With }}*spark.sql.files.ignoreMissingFiles=true*{{ we notice 
[readParquetFootersInParallel |#L438]can still encounter 
{*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With *spark.sql.files.ignoreMissingFiles=true* we notice 
> [readParquetFootersInParallel |#L438]can still encounter 
> {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With }}*spark.sql.files.ignoreMissingFiles=true*{{ we notice 
[readParquetFootersInParallel |#L438]can still encounter 
{*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

With`spark.sql.files.ignoreMissingFiles=true` we notice 
[readParquetFootersInParallel|#L438]] can still encounter 
`FileNotFoundException`. 

I notice function `readParquetFootersInParallel` can handle scenario where 
`{{{}spark.sql.files.ignoreCorruptFiles=true`.{}}}

 

Would it be feasible to support the scenario where 
`spark.sql.files.ignoreMissingFiles=true` when `readParquetFootersInParallel` 
is called?

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With }}*spark.sql.files.ignoreMissingFiles=true*{{ we notice 
> [readParquetFootersInParallel |#L438]can still encounter 
> {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice [link 
title|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can
 still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice 
[readParquetFootersInParallel |#L438]can still encounter 
{*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With *spark.sql.files.ignoreMissingFiles=true* we notice [link 
> title|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can
>  still encounter {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can
 still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice [link 
title|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can
 still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With *spark.sql.files.ignoreMissingFiles=true* we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can
>  still encounter {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With `spark.sql.files.ignoreMissingFiles=true` we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With `spark.sql.files.ignoreMissingFiles=true` we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
>  can still encounter {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With *spark.sql.files.ignoreMissingFiles=true* we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]can
 still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With *spark.sql.files.ignoreMissingFiles=true* we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
>  can still encounter {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With spark.sql.files.ignoreMissingFiles=true we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With {{spark.sql.files.ignoreMissingFiles=true}} we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With spark.sql.files.ignoreMissingFiles=true we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
>  can still encounter {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With {{spark.sql.files.ignoreMissingFiles=true}} we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you

  was:
Hello,

{{With `spark.sql.files.ignoreMissingFiles=true` we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With {{spark.sql.files.ignoreMissingFiles=true}} we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
>  can still encounter {*}FileNotFoundException{*}. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{*FileNotFoundException.*}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Description: 
Hello,

{{With spark.sql.files.ignoreMissingFiles=true we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter FileNotFoundException. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{FileNotFoundException{*}.{*}}}

 

Thank you

  was:
Hello,

{{With spark.sql.files.ignoreMissingFiles=true we notice 
[readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
 can still encounter {*}FileNotFoundException{*}. }}

I notice function readParquetFootersInParallel handle the scenario where 
{{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}

My question: Would it be feasible to support the scenario where 
*spark.sql.files.ignoreMissingFiles=true* in the function: 
readParquetFootersInParallel  as well ? To prevent application failure due to 
{{*FileNotFoundException.*}}

 

Thank you


> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With spark.sql.files.ignoreMissingFiles=true we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
>  can still encounter FileNotFoundException. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{FileNotFoundException{*}.{*}}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43263) Upgrade FasterXML jackson to 2.15.0

2023-04-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-43263:

Summary: Upgrade FasterXML jackson to 2.15.0  (was: Upgrade FasterXML / 
jackson-dataformats-text to 2.15.0)

> Upgrade FasterXML jackson to 2.15.0
> ---
>
> Key: SPARK-43263
> URL: https://issues.apache.org/jira/browse/SPARK-43263
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves 
> [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471])
>  (contributed by @pjfannin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43263) Upgrade FasterXML jackson to 2.15.0

2023-04-24 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715949#comment-17715949
 ] 

PJ Fanning commented on SPARK-43263:


This is a duplicate of SPARK-42854 and it is not a good idea to disregard the 
points made in SPARK-42854

> Upgrade FasterXML jackson to 2.15.0
> ---
>
> Key: SPARK-43263
> URL: https://issues.apache.org/jira/browse/SPARK-43263
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves 
> [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471])
>  (contributed by @pjfannin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43269) Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true

2023-04-24 Thread Neil Jonkers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Jonkers updated SPARK-43269:
-
Issue Type: Improvement  (was: Bug)

> Adding support for MissingFiles when spark.sql.parquet.mergeSchema=true
> ---
>
> Key: SPARK-43269
> URL: https://issues.apache.org/jira/browse/SPARK-43269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Neil Jonkers
>Priority: Minor
>
> Hello,
> {{With spark.sql.files.ignoreMissingFiles=true we notice 
> [readParquetFootersInParallel|https://github.com/apache/spark/blob/52c1068190803d856959ba563642a3e440cc086c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L438]
>  can still encounter FileNotFoundException. }}
> I notice function readParquetFootersInParallel handle the scenario where 
> {{{*}spark.sql.files.ignoreCorruptFiles=true{*}.}}
> My question: Would it be feasible to support the scenario where 
> *spark.sql.files.ignoreMissingFiles=true* in the function: 
> readParquetFootersInParallel  as well ? To prevent application failure due to 
> {{FileNotFoundException{*}.{*}}}
>  
> Thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)
Beishao Cao created SPARK-43270:
---

 Summary: Implement __dir__() in pyspark.sql.dataframe.DataFrame to 
include columns
 Key: SPARK-43270
 URL: https://issues.apache.org/jira/browse/SPARK-43270
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Beishao Cao


Currently, {{df.|}} will only suggest the method of dataframe(see attached 
photo of databricks notebook), but {{df.column_name}} is also legal. 

!image-2023-04-24-13-44-33-716.png|width=389,height=248!

So we should override the parent {{__dir__}} method on Python {{DataFrame}} 
class to include column names. And the benefit of this is engine that uses 
{{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks 
Notebooks) will suggest column names on the completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Attachment: Screenshot 2023-04-23 at 6.48.46 PM-1.png

> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, {{df.|}} will only suggest the method of dataframe(see attached 
> photo of databricks notebook), but {{df.column_name}} is also legal. 
> !image-2023-04-24-13-44-33-716.png|width=389,height=248!
> So we should override the parent {{__dir__}} method on Python {{DataFrame}} 
> class to include column names. And the benefit of this is engine that uses 
> {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
> Databricks Notebooks) will suggest column names on the completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Attachment: (was: Screenshot 2023-04-23 at 6.48.46 PM-1.png)

> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, {{df.|}} will only suggest the method of dataframe(see attached 
> photo of databricks notebook), but {{df.column_name}} is also legal. 
>  
>  
> So we should override the parent {{_{_}dir{_}_}} method on Python 
> {{DataFrame}} class to include column names. And the benefit of this is 
> engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
> kernel, Databricks Notebooks) will suggest column names on the completion 
> {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
Screenshot of databricks notebook), but {{df.column_name}} is also legal. 

Hence we should override the parent {{{_}{{_}}dir{{_}}{_}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
photo of databricks notebook), but {{df.column_name}} is also legal. 

 

 

So we should override the parent {{_{_}dir{_}_}} method on Python {{DataFrame}} 
class to include column names. And the benefit of this is engine that uses 
{{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks 
Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, {{df.|}} will only suggest the method of dataframe(see attached 
> Screenshot of databricks notebook), but {{df.column_name}} is also legal. 
> Hence we should override the parent {{{_}{{_}}dir{{_}}{_}}} method on Python 
> {{DataFrame}} class to include column names. And the benefit of this is 
> engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
> kernel, Databricks Notebooks) will suggest column names on the completion 
> {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
photo of databricks notebook), but {{df.column_name}} is also legal. 

 

 

So we should override the parent {{_{_}dir{_}_}} method on Python {{DataFrame}} 
class to include column names. And the benefit of this is engine that uses 
{{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks 
Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
photo of databricks notebook), but {{df.column_name}} is also legal. 

!image-2023-04-24-13-44-33-716.png|width=389,height=248!

So we should override the parent {{__dir__}} method on Python {{DataFrame}} 
class to include column names. And the benefit of this is engine that uses 
{{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, Databricks 
Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, {{df.|}} will only suggest the method of dataframe(see attached 
> photo of databricks notebook), but {{df.column_name}} is also legal. 
>  
>  
> So we should override the parent {{_{_}dir{_}_}} method on Python 
> {{DataFrame}} class to include column names. And the benefit of this is 
> engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
> kernel, Databricks Notebooks) will suggest column names on the completion 
> {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Attachment: Screenshot 2023-04-23 at 6.48.46 PM.png

> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, {{df.|}} will only suggest the method of dataframe(see attached 
> photo of databricks notebook), but {{df.column_name}} is also legal. 
> !image-2023-04-24-13-44-33-716.png|width=389,height=248!
> So we should override the parent {{__dir__}} method on Python {{DataFrame}} 
> class to include column names. And the benefit of this is engine that uses 
> {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
> Databricks Notebooks) will suggest column names on the completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
Screenshot of databricks notebook), but {{df.column_name}} is also legal. 

Hence we should override the parent __{{{}dir__{}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
Screenshot of databricks notebook), but {{df.column_name}} is also legal. 

Hence we should override the parent {{{_}{{_}}dir{{_}}{_}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, {{df.|}} will only suggest the method of dataframe(see attached 
> Screenshot of databricks notebook), but {{df.column_name}} is also legal. 
> Hence we should override the parent __{{{}dir__{}}} method on Python 
> {{DataFrame}} class to include column names. And the benefit of this is 
> engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
> kernel, Databricks Notebooks) will suggest column names on the completion 
> {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent _{_}{{dir}}{_}{{{}_{}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, {{df.|}} will only suggest the method of dataframe(see attached 
Screenshot of databricks notebook), but {{df.column_name}} is also legal. 

Hence we should override the parent __{{{}dir__{}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Given {{df.|}} , the databricks notebook will only suggest the 
> method of dataframe(see attached Screenshot of databricks notebook),
> {{However, df.column_name}} is also legal and runnable 
> Hence we should override the parent _{_}{{dir}}{_}{{{}_{}}} method on Python 
> {{DataFrame}} class to include column names. And the benefit of this is 
> engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
> kernel, Databricks Notebooks) will suggest column names on the completion 
> {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent _{_}{{dir}}{_}{{{}_(){}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent __{{{}dir__(){}}}{{{}{}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Given {{df.|}} , the databricks notebook will only suggest the 
> method of dataframe(see attached Screenshot of databricks notebook),
> {{However, df.column_name}} is also legal and runnable 
> Hence we should override the parent _{_}{{dir}}{_}{{{}_(){}}} method on 
> Python {{DataFrame}} class to include column names. And the benefit of this 
> is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. 
> IPython kernel, Databricks Notebooks) will suggest column names on the 
> completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent {{{}dir{}}}{{{}{}}}{{{}(){}}} method on 
Python {{DataFrame}} class to include column names. And the benefit of this is 
engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
kernel, Databricks Notebooks) will suggest column names on the completion 
{{df.|}} 

  was:
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent _{_}{{dir}}{_}{{{}_(){}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Given {{df.|}} , the databricks notebook will only suggest the 
> method of dataframe(see attached Screenshot of databricks notebook),
> {{However, df.column_name}} is also legal and runnable 
> Hence we should override the parent {{{}dir{}}}{{{}{}}}{{{}(){}}} method on 
> Python {{DataFrame}} class to include column names. And the benefit of this 
> is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. 
> IPython kernel, Databricks Notebooks) will suggest column names on the 
> completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent __{{{}dir__(){}}}{{{}{}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent _{_}{{dir}}{_}{{{}_{}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Given {{df.|}} , the databricks notebook will only suggest the 
> method of dataframe(see attached Screenshot of databricks notebook),
> {{However, df.column_name}} is also legal and runnable 
> Hence we should override the parent __{{{}dir__(){}}}{{{}{}}} method on 
> Python {{DataFrame}} class to include column names. And the benefit of this 
> is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. 
> IPython kernel, Databricks Notebooks) will suggest column names on the 
> completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Beishao Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beishao Cao updated SPARK-43270:

Description: 
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent __{{{}dir__{}}}{{{}(){}}} method on Python 
{{DataFrame}} class to include column names. And the benefit of this is engine 
that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython kernel, 
Databricks Notebooks) will suggest column names on the completion {{df.|}} 

  was:
Currently, Given {{df.|}} , the databricks notebook will only suggest the 
method of dataframe(see attached Screenshot of databricks notebook),

{{However, df.column_name}} is also legal and runnable 

Hence we should override the parent {{{}dir{}}}{{{}{}}}{{{}(){}}} method on 
Python {{DataFrame}} class to include column names. And the benefit of this is 
engine that uses {{dir()}} to generate autocomplete suggestions (e.g. IPython 
kernel, Databricks Notebooks) will suggest column names on the completion 
{{df.|}} 


> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Given {{df.|}} , the databricks notebook will only suggest the 
> method of dataframe(see attached Screenshot of databricks notebook),
> {{However, df.column_name}} is also legal and runnable 
> Hence we should override the parent __{{{}dir__{}}}{{{}(){}}} method on 
> Python {{DataFrame}} class to include column names. And the benefit of this 
> is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. 
> IPython kernel, Databricks Notebooks) will suggest column names on the 
> completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014

2023-04-24 Thread Atour Mousavi Gourabi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716006#comment-17716006
 ] 

Atour Mousavi Gourabi commented on SPARK-43250:
---

I'd like to take this one if you guys don't mind. Seems like a nice way to get 
to know the codebase.

> Assign a name to the error class _LEGACY_ERROR_TEMP_2014
> 
>
> Key: SPARK-43250
> URL: https://issues.apache.org/jira/browse/SPARK-43250
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43268) Use proper error classes when exceptions are constructed with a message

2023-04-24 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-43268.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40934
[https://github.com/apache/spark/pull/40934]

> Use proper error classes when exceptions are constructed with a message
> ---
>
> Key: SPARK-43268
> URL: https://issues.apache.org/jira/browse/SPARK-43268
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.5.0
>
>
> As discussed 
> [here|https://github.com/apache/spark/pull/40679/files#r1159264585].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43268) Use proper error classes when exceptions are constructed with a message

2023-04-24 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-43268:


Assignee: Anton Okolnychyi

> Use proper error classes when exceptions are constructed with a message
> ---
>
> Key: SPARK-43268
> URL: https://issues.apache.org/jira/browse/SPARK-43268
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>
> As discussed 
> [here|https://github.com/apache/spark/pull/40679/files#r1159264585].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43233) Before batch reading from Kafka, log topic partition, offset range, etc, for debuggin

2023-04-24 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-43233.
--
Fix Version/s: 3.5.0
 Assignee: Siying Dong
   Resolution: Fixed

Issue resolved via https://github.com/apache/spark/pull/40905

> Before batch reading from Kafka, log topic partition, offset range, etc, for 
> debuggin
> -
>
> Key: SPARK-43233
> URL: https://issues.apache.org/jira/browse/SPARK-43233
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Trivial
> Fix For: 3.5.0
>
>
> When debugging some slowness issue in structured streaming, it is hard to map 
> a Kafka topic and partition to a Kafka task. Adding some logging in executor 
> might help make it easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43262) Migrate Spark Connect Structured Streaming errors into error class

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43262.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40928
[https://github.com/apache/spark/pull/40928]

> Migrate Spark Connect Structured Streaming errors into error class
> --
>
> Key: SPARK-43262
> URL: https://issues.apache.org/jira/browse/SPARK-43262
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> from pyspark/sql/connect/streaming



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43262) Migrate Spark Connect Structured Streaming errors into error class

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43262:
-

Assignee: Haejoon Lee

> Migrate Spark Connect Structured Streaming errors into error class
> --
>
> Key: SPARK-43262
> URL: https://issues.apache.org/jira/browse/SPARK-43262
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> from pyspark/sql/connect/streaming



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43271) Fix test DataFrameTests.test_reindex with specifying `index`.

2023-04-24 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-43271:
---

 Summary: Fix test DataFrameTests.test_reindex with specifying 
`index`.
 Key: SPARK-43271
 URL: https://issues.apache.org/jira/browse/SPARK-43271
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Re-enable pandas 2.0.0 test in DataFrameTests.test_reindex in proper way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43271) Match behavior with DataFrame.reindex with specifying `index`.

2023-04-24 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43271:

Summary: Match behavior with DataFrame.reindex with specifying `index`.  
(was: Fix test DataFrameTests.test_reindex with specifying `index`.)

> Match behavior with DataFrame.reindex with specifying `index`.
> --
>
> Key: SPARK-43271
> URL: https://issues.apache.org/jira/browse/SPARK-43271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Re-enable pandas 2.0.0 test in DataFrameTests.test_reindex in proper way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42419) Migrate `TypeError` into error framework for Spark Connect column API.

2023-04-24 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716071#comment-17716071
 ] 

Snoot.io commented on SPARK-42419:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40927

> Migrate `TypeError` into error framework for Spark Connect column API.
> --
>
> Key: SPARK-42419
> URL: https://issues.apache.org/jira/browse/SPARK-42419
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> We should migrate all errors into PySpark error framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43270) Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns

2023-04-24 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716072#comment-17716072
 ] 

Snoot.io commented on SPARK-43270:
--

User 'alexanderwu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40907

> Implement __dir__() in pyspark.sql.dataframe.DataFrame to include columns
> -
>
> Key: SPARK-43270
> URL: https://issues.apache.org/jira/browse/SPARK-43270
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Beishao Cao
>Priority: Major
> Attachments: Screenshot 2023-04-23 at 6.48.46 PM.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Given {{df.|}} , the databricks notebook will only suggest the 
> method of dataframe(see attached Screenshot of databricks notebook),
> {{However, df.column_name}} is also legal and runnable 
> Hence we should override the parent __{{{}dir__{}}}{{{}(){}}} method on 
> Python {{DataFrame}} class to include column names. And the benefit of this 
> is engine that uses {{dir()}} to generate autocomplete suggestions (e.g. 
> IPython kernel, Databricks Notebooks) will suggest column names on the 
> completion {{df.|}} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43144) Scala: DataStreamReader table() API

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43144.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40887
[https://github.com/apache/spark/pull/40887]

> Scala: DataStreamReader table() API
> ---
>
> Key: SPARK-43144
> URL: https://issues.apache.org/jira/browse/SPARK-43144
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43260) Migrate the Spark SQL pandas arrow type errors into error class.

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43260:
-

Assignee: Haejoon Lee

> Migrate the Spark SQL pandas arrow type errors into error class.
> 
>
> Key: SPARK-43260
> URL: https://issues.apache.org/jira/browse/SPARK-43260
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> from pyspark/sql/pandas/types.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43260) Migrate the Spark SQL pandas arrow type errors into error class.

2023-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43260.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40924
[https://github.com/apache/spark/pull/40924]

> Migrate the Spark SQL pandas arrow type errors into error class.
> 
>
> Key: SPARK-43260
> URL: https://issues.apache.org/jira/browse/SPARK-43260
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> from pyspark/sql/pandas/types.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43272) Replace reflection w/ direct calling for `SparkHadoopUtil#createFile`

2023-04-24 Thread Yang Jie (Jira)
Yang Jie created SPARK-43272:


 Summary: Replace reflection w/ direct calling for  
`SparkHadoopUtil#createFile`
 Key: SPARK-43272
 URL: https://issues.apache.org/jira/browse/SPARK-43272
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43250) Assign a name to the error class _LEGACY_ERROR_TEMP_2014

2023-04-24 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716118#comment-17716118
 ] 

Max Gekk commented on SPARK-43250:
--

[~amousavigourabi] Sure, go ahead.

> Assign a name to the error class _LEGACY_ERROR_TEMP_2014
> 
>
> Key: SPARK-43250
> URL: https://issues.apache.org/jira/browse/SPARK-43250
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2014* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org