[jira] [Updated] (SPARK-48128) BitwiseCount / bit_count generated code for boolean inputs fails to compile

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48128:
---
Labels: pull-request-available  (was: )

> BitwiseCount / bit_count generated code for boolean inputs fails to compile
> ---
>
> Key: SPARK-48128
> URL: https://issues.apache.org/jira/browse/SPARK-48128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
>
> If the `BitwiseCount` / `bit_count` expresison is applied to a boolean type 
> column then then it will trigger codegen fallback to interpreted because the 
> generated code contains invalid Java syntax, triggering errors like
> {code}
>  java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 41, Column 11: Failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 41, Column 11: Unexpected token "if" in primary
> {code}
> This problem was masked because the QueryTest framework may not be fully 
> exercising codegen paths (e.g. if constant folding occurs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48128) BitwiseCount / bit_count generated code for boolean inputs fails to compile

2024-05-03 Thread Josh Rosen (Jira)
Josh Rosen created SPARK-48128:
--

 Summary: BitwiseCount / bit_count generated code for boolean 
inputs fails to compile
 Key: SPARK-48128
 URL: https://issues.apache.org/jira/browse/SPARK-48128
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Josh Rosen
Assignee: Josh Rosen


If the `BitwiseCount` / `bit_count` expresison is applied to a boolean type 
column then then it will trigger codegen fallback to interpreted because the 
generated code contains invalid Java syntax.

This problem was masked because the QueryTest framework may not be fully 
exercising codegen paths (e.g. if constant folding occurs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48128) BitwiseCount / bit_count generated code for boolean inputs fails to compile

2024-05-03 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-48128:
---
Description: 
If the `BitwiseCount` / `bit_count` expresison is applied to a boolean type 
column then then it will trigger codegen fallback to interpreted because the 
generated code contains invalid Java syntax, triggering errors like

{code}
 java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 41, 
Column 11: Failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 41, Column 11: Unexpected token "if" in primary
{code}

This problem was masked because the QueryTest framework may not be fully 
exercising codegen paths (e.g. if constant folding occurs).

  was:
If the `BitwiseCount` / `bit_count` expresison is applied to a boolean type 
column then then it will trigger codegen fallback to interpreted because the 
generated code contains invalid Java syntax.

This problem was masked because the QueryTest framework may not be fully 
exercising codegen paths (e.g. if constant folding occurs).


> BitwiseCount / bit_count generated code for boolean inputs fails to compile
> ---
>
> Key: SPARK-48128
> URL: https://issues.apache.org/jira/browse/SPARK-48128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> If the `BitwiseCount` / `bit_count` expresison is applied to a boolean type 
> column then then it will trigger codegen fallback to interpreted because the 
> generated code contains invalid Java syntax, triggering errors like
> {code}
>  java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 41, Column 11: Failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 41, Column 11: Unexpected token "if" in primary
> {code}
> This problem was masked because the QueryTest framework may not be fully 
> exercising codegen paths (e.g. if constant folding occurs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48127) Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48127.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46376
[https://github.com/apache/spark/pull/46376]

> Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules
> ---
>
> Key: SPARK-48127
> URL: https://issues.apache.org/jira/browse/SPARK-48127
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48116) Run `pyspark-pandas*` in PR builder and Daily Python CIs

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48116.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46367
[https://github.com/apache/spark/pull/46367]

> Run `pyspark-pandas*` in PR builder and Daily Python CIs
> 
>
> Key: SPARK-48116
> URL: https://issues.apache.org/jira/browse/SPARK-48116
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48127) Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48127:
--
Summary: Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` 
modules  (was: Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profile` 
modules)

> Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules
> ---
>
> Key: SPARK-48127
> URL: https://issues.apache.org/jira/browse/SPARK-48127
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48127) Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profile` modules

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48127:
---
Labels: pull-request-available  (was: )

> Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profile` modules
> --
>
> Key: SPARK-48127
> URL: https://issues.apache.org/jira/browse/SPARK-48127
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48127) Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profile` modules

2024-05-03 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48127:
-

 Summary: Fix `dev/scalastyle` to check `hadoop-cloud` and 
`jvm-profile` modules
 Key: SPARK-48127
 URL: https://issues.apache.org/jira/browse/SPARK-48127
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48124) Disable structured logging for Interpreter by default

2024-05-03 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-48124:
---
Description: 
Since there are plain text output from 
Interpreters(spark-shell/spark-sql/pyspark), it makes more sense to disable 
structured logging for Interpreters by default.

 

spark-shell output when with structured logging enabled:

```

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

Welcome to

                    __

     / __/__  ___ _/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT

      /_/

         

Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 17.0.9)

Type in expressions to have them evaluated.

Type :help for more information.

{"ts":"2024-05-04T01:11:03.797Z","level":"WARN","msg":"Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable","logger":"NativeCodeLoader"}

{"ts":"2024-05-04T01:11:04.104Z","level":"WARN","msg":"Service 'SparkUI' could 
not bind on port 4040. Attempting port 4041.","logger":"Utils"}

Spark context Web UI available at http://10.10.114.155:4041

Spark context available as 'sc' (master = local[*], app id = 
local-1714785064155).

Spark session available as 'spark'.

```

 

spark-shell output when without structured logging enabled:

```

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

Welcome to

                    __

     / __/__  ___ _/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT

      /_/

         

Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 17.0.9)

Type in expressions to have them evaluated.

Type :help for more information.

24/05/03 18:11:35 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

24/05/03 18:11:35 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.

Spark context Web UI available at http://10.10.114.155:4041

Spark context available as 'sc' (master = local[*], app id = 
local-1714785095892).

Spark session available as 'spark'.

```

  was:
Since there are plain text output from 
Interpreters(spark-shell/spark-sql/pyspark), it makes more sense to disable 
structured logging for Interpreters by default.

 


> Disable structured logging for Interpreter by default
> -
>
> Key: SPARK-48124
> URL: https://issues.apache.org/jira/browse/SPARK-48124
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Since there are plain text output from 
> Interpreters(spark-shell/spark-sql/pyspark), it makes more sense to disable 
> structured logging for Interpreters by default.
>  
> spark-shell output when with structured logging enabled:
> ```
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
>       /_/
>          
> Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 17.0.9)
> Type in expressions to have them evaluated.
> Type :help for more information.
> {"ts":"2024-05-04T01:11:03.797Z","level":"WARN","msg":"Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable","logger":"NativeCodeLoader"}
> {"ts":"2024-05-04T01:11:04.104Z","level":"WARN","msg":"Service 'SparkUI' 
> could not bind on port 4040. Attempting port 4041.","logger":"Utils"}
> Spark context Web UI available at http://10.10.114.155:4041
> Spark context available as 'sc' (master = local[*], app id = 
> local-1714785064155).
> Spark session available as 'spark'.
> ```
>  
> spark-shell output when without structured logging enabled:
> ```
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
>       /_/
>          
> Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 17.0.9)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 24/05/03 18:11:35 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 24/05/03 18:11:35 WARN Utils: 

[jira] [Updated] (SPARK-48124) Disable structured logging for Interpreter by default

2024-05-03 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-48124:
---
Description: 
Since there are plain text output from 
Interpreters(spark-shell/spark-sql/pyspark), it makes more sense to disable 
structured logging for Interpreters by default.

 

> Disable structured logging for Interpreter by default
> -
>
> Key: SPARK-48124
> URL: https://issues.apache.org/jira/browse/SPARK-48124
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Since there are plain text output from 
> Interpreters(spark-shell/spark-sql/pyspark), it makes more sense to disable 
> structured logging for Interpreters by default.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48126) Make spark.log.structuredLogging.enabled effecitve

2024-05-03 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-48126:
--

 Summary: Make spark.log.structuredLogging.enabled effecitve
 Key: SPARK-48126
 URL: https://issues.apache.org/jira/browse/SPARK-48126
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Currently, the spark conf spark.log.structuredLogging.enabled is not taking 
effects. We need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48124) Disable structured logging for Interpreter by default

2024-05-03 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-48124:
--

 Summary: Disable structured logging for Interpreter by default
 Key: SPARK-48124
 URL: https://issues.apache.org/jira/browse/SPARK-48124
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47484) Allow trailing comma in column definition list

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47484:
---
Labels: pull-request-available  (was: )

> Allow trailing comma in column definition list
> --
>
> Key: SPARK-47484
> URL: https://issues.apache.org/jira/browse/SPARK-47484
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> When experimenting with SQL queries, it's common to accidentally leave a 
> trailing comma after the last column in a table definition, resulting in 
> syntax errors. This often happens when developers comment out the last column 
> and forget to remove the trailing comma, leading to unexpected syntax issues 
> when creating tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48059) Implement the structured log framework on the java side

2024-05-03 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-48059:
--

Assignee: BingKun Pan

> Implement the structured log framework on the java side
> ---
>
> Key: SPARK-48059
> URL: https://issues.apache.org/jira/browse/SPARK-48059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48059) Implement the structured log framework on the java side

2024-05-03 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-48059.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46301
[https://github.com/apache/spark/pull/46301]

> Implement the structured log framework on the java side
> ---
>
> Key: SPARK-48059
> URL: https://issues.apache.org/jira/browse/SPARK-48059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48121) Promote ` KubernetesDriverConf` to `DeveloperApi`

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48121.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46373
[https://github.com/apache/spark/pull/46373]

> Promote ` KubernetesDriverConf` to `DeveloperApi`
> -
>
> Key: SPARK-48121
> URL: https://issues.apache.org/jira/browse/SPARK-48121
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48123) Provide a constant table schema for querying structured logs

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48123:
---
Labels: pull-request-available  (was: )

> Provide a constant table schema for querying structured logs
> 
>
> Key: SPARK-48123
> URL: https://issues.apache.org/jira/browse/SPARK-48123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Providing a table schema LOG_SCHEMA, so that users can load structured logs 
> with the following:
> ```
> spark.read.schema(LOG_SCHEMA).json(logPath)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48123) Provide a constant table schema for querying structured logs

2024-05-03 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-48123:
--

 Summary: Provide a constant table schema for querying structured 
logs
 Key: SPARK-48123
 URL: https://issues.apache.org/jira/browse/SPARK-48123
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Providing a table schema LOG_SCHEMA, so that users can load structured logs 
with the following:

```

spark.read.schema(LOG_SCHEMA).json(logPath)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48121) Promote ` KubernetesDriverConf` to `DeveloperApi`

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48121:
---
Labels: pull-request-available  (was: )

> Promote ` KubernetesDriverConf` to `DeveloperApi`
> -
>
> Key: SPARK-48121
> URL: https://issues.apache.org/jira/browse/SPARK-48121
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48120) Enable autolink to SPARK jira issue

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48120.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 11
[https://github.com/apache/spark-kubernetes-operator/pull/11]

> Enable autolink to SPARK jira issue
> ---
>
> Key: SPARK-48120
> URL: https://issues.apache.org/jira/browse/SPARK-48120
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48121) Promote ` KubernetesDriverConf` to `DeveloperApi`

2024-05-03 Thread Zhou JIANG (Jira)
Zhou JIANG created SPARK-48121:
--

 Summary: Promote ` KubernetesDriverConf` to `DeveloperApi`
 Key: SPARK-48121
 URL: https://issues.apache.org/jira/browse/SPARK-48121
 Project: Spark
  Issue Type: Sub-task
  Components: k8s
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48114) ErrorClassesJsonReader complies template regex on every template resolution

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48114.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46365
[https://github.com/apache/spark/pull/46365]

> ErrorClassesJsonReader complies template regex on every template resolution
> ---
>
> Key: SPARK-48114
> URL: https://issues.apache.org/jira/browse/SPARK-48114
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> `SparkRuntimeException` uses `SparkThrowableHelper`, which uses 
> `ErrorClassesJsonReader` to create error message string from templates in 
> `error-conditions.json`, but template regex is compiled on every 
> `SparkRuntimeException` constructor invocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48120) Enable autolink to SPARK jira issue

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48120:
---
Labels: pull-request-available  (was: )

> Enable autolink to SPARK jira issue
> ---
>
> Key: SPARK-48120
> URL: https://issues.apache.org/jira/browse/SPARK-48120
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45923) Spark Kubernetes Operator

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45923:
--
Affects Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Spark Kubernetes Operator
> -
>
> Key: SPARK-45923
> URL: https://issues.apache.org/jira/browse/SPARK-45923
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou Jiang
>Assignee: Zhou Jiang
>Priority: Major
>  Labels: SPIP
>
> We would like to develop a Java-based Kubernetes operator for Apache Spark. 
> Following the operator pattern 
> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> users may manage applications and related components seamlessly using native 
> tools like kubectl. The primary goal is to simplify the Spark user experience 
> on Kubernetes, minimizing the learning curve and operational complexities and 
> therefore enable users to focus on the Spark application development.
> Ideally, it would reside in a separate repository (like Spark docker or Spark 
> connect golang) and be loosely connected to the Spark release cycle while 
> supporting multiple Spark versions.
> SPIP doc: 
> [https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE|https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE/edit#heading=h.hhham7siu2vi]
> Dev email discussion : 
> [https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48120) Enable autolink to SPARK jira issue

2024-05-03 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48120:
-

 Summary: Enable autolink to SPARK jira issue
 Key: SPARK-48120
 URL: https://issues.apache.org/jira/browse/SPARK-48120
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: kubernetes-operator-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48119) Promote ` KubernetesDriverSpec` to `DeveloperApi`

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48119.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46371
[https://github.com/apache/spark/pull/46371]

> Promote ` KubernetesDriverSpec` to `DeveloperApi`
> -
>
> Key: SPARK-48119
> URL: https://issues.apache.org/jira/browse/SPARK-48119
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48118) Support SPARK_SQL_LEGACY_CREATE_HIVE_TABLE env variable

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48118.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46369
[https://github.com/apache/spark/pull/46369]

> Support SPARK_SQL_LEGACY_CREATE_HIVE_TABLE env variable
> ---
>
> Key: SPARK-48118
> URL: https://issues.apache.org/jira/browse/SPARK-48118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This issue aims to support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable 
> to provide more easier migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48119) Promote ` KubernetesDriverSpec` to `DeveloperApi`

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48119:
---
Labels: pull-request-available  (was: )

> Promote ` KubernetesDriverSpec` to `DeveloperApi`
> -
>
> Key: SPARK-48119
> URL: https://issues.apache.org/jira/browse/SPARK-48119
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48119) Promote ` KubernetesDriverSpec` to `DeveloperApi`

2024-05-03 Thread Zhou JIANG (Jira)
Zhou JIANG created SPARK-48119:
--

 Summary: Promote ` KubernetesDriverSpec` to `DeveloperApi`
 Key: SPARK-48119
 URL: https://issues.apache.org/jira/browse/SPARK-48119
 Project: Spark
  Issue Type: Sub-task
  Components: k8s
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48118) Support SPARK_SQL_LEGACY_CREATE_HIVE_TABLE env variable

2024-05-03 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48118:
-

 Summary: Support SPARK_SQL_LEGACY_CREATE_HIVE_TABLE env variable
 Key: SPARK-48118
 URL: https://issues.apache.org/jira/browse/SPARK-48118
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun


This issue aims to support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable to 
provide more easier migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47336) Provide to PySpark a functionality to get estimated size of DataFrame in bytes

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47336:
---
Labels: pull-request-available  (was: )

> Provide to PySpark a functionality to get estimated size of DataFrame in bytes
> --
>
> Key: SPARK-47336
> URL: https://issues.apache.org/jira/browse/SPARK-47336
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Semyon Sinchenko
>Priority: Minor
>  Labels: pull-request-available
>
> Something equal to 
> sessionState().executePlan(...).optimizedPlan().stats().sizeInBytes() in 
> JVM-Spark. It may be done via simple call of `_jsparkSession` in a regular 
> PySpark and via a plugin for Spark Connect.
>  
> This functionality is useful when one need to check a possibility of 
> broadcast join without modifying global broadcast threshold.
>  
> The function in PySpark API may looks like: 
> `DataFrame.estimate_size_in_bytes() -> float` or 
> `DataFrame.estimateSizeInBytes() -> float`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48116) Run `pyspark-pandas*` in PR builder and Daily Python CIs

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48116:
--
Summary: Run `pyspark-pandas*` in PR builder and Daily Python CIs  (was: 
Move `pyspark-pandas*` tests to Daily Python CIs)

> Run `pyspark-pandas*` in PR builder and Daily Python CIs
> 
>
> Key: SPARK-48116
> URL: https://issues.apache.org/jira/browse/SPARK-48116
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48115) Remove `Python 3.11` from `build_python.yml`

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48115:
---
Labels: pull-request-available  (was: )

> Remove `Python 3.11` from `build_python.yml`
> 
>
> Key: SPARK-48115
> URL: https://issues.apache.org/jira/browse/SPARK-48115
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48114) ErrorClassesJsonReader complies template regex on every template resolution

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48114:
---
Labels: pull-request-available  (was: )

> ErrorClassesJsonReader complies template regex on every template resolution
> ---
>
> Key: SPARK-48114
> URL: https://issues.apache.org/jira/browse/SPARK-48114
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> `SparkRuntimeException` uses `SparkThrowableHelper`, which uses 
> `ErrorClassesJsonReader` to create error message string from templates in 
> `error-conditions.json`, but template regex is compiled on every 
> `SparkRuntimeException` constructor invocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48114) ErrorClassesJsonReader complies template regex on every template resolution

2024-05-03 Thread Vladimir Golubev (Jira)
Vladimir Golubev created SPARK-48114:


 Summary: ErrorClassesJsonReader complies template regex on every 
template resolution
 Key: SPARK-48114
 URL: https://issues.apache.org/jira/browse/SPARK-48114
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Vladimir Golubev


`SparkRuntimeException` uses `SparkThrowableHelper`, which uses 
`ErrorClassesJsonReader` to create error message string from templates in 
`error-conditions.json`, but template regex is compiled on every 
`SparkRuntimeException` constructor invocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47240:
--

Assignee: (was: Apache Spark)

> SPIP: Structured Logging Framework for Apache Spark
> ---
>
> Key: SPARK-47240
> URL: https://issues.apache.org/jira/browse/SPARK-47240
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> This proposal aims to enhance Apache Spark's logging system by implementing 
> structured logging. This transition will change the format of the default log 
> files from plain text to JSON, making them more accessible and analyzable. 
> The new logs will include crucial identifiers such as worker, executor, 
> query, job, stage, and task IDs, thereby making the logs more informative and 
> facilitating easier search and analysis.
> h2. Current Logging Format
> The current format of Spark logs is plain text, which can be challenging to 
> parse and analyze efficiently. An example of the current log format is as 
> follows:
> {code:java}
> 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor 
> 289 is alive or not.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: ..
> {code}
> h2. Proposed Structured Logging Format
> The proposed change involves structuring the logs in JSON format, which 
> organizes the log information into easily identifiable fields. Here is how 
> the new structured log format would look:
> {code:java}
> {
>    "ts":"23/11/29 17:53:44",
>    "level":"ERROR",
>    "msg":"Fail to know the executor 289 is alive or not",
>    "context":{
>        "executor_id":"289"
>    },
>    "exception":{
>       "class":"org.apache.spark.SparkException",
>       "msg":"Exception thrown in awaitResult",
>       "stackTrace":"..."
>    },
>    "source":"BlockManagerMasterEndpoint"
> } {code}
> This format will enable users to upload and directly query 
> driver/executor/master/worker log files using Spark SQL for more effective 
> problem-solving and analysis, such as tracking executor losses or identifying 
> faulty tasks:
> {code:java}
> spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs")
> /* To get all the executor lost logs */
> SELECT * FROM logs WHERE contains(message, 'Lost executor');
> /* To get all the distributed logs about executor 289 */
> SELECT * FROM logs WHERE executor_id = 289;
> /* To get all the errors on host 100.116.29.4 */
> SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR";
> {code}
>  
> SPIP doc: 
> [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48112) Expose session in SparkConnectPlanner to plugin developers

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48112:
--

Assignee: Apache Spark

> Expose session in SparkConnectPlanner to plugin developers
> --
>
> Key: SPARK-48112
> URL: https://issues.apache.org/jira/browse/SPARK-48112
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47240:
--

Assignee: Apache Spark

> SPIP: Structured Logging Framework for Apache Spark
> ---
>
> Key: SPARK-47240
> URL: https://issues.apache.org/jira/browse/SPARK-47240
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> This proposal aims to enhance Apache Spark's logging system by implementing 
> structured logging. This transition will change the format of the default log 
> files from plain text to JSON, making them more accessible and analyzable. 
> The new logs will include crucial identifiers such as worker, executor, 
> query, job, stage, and task IDs, thereby making the logs more informative and 
> facilitating easier search and analysis.
> h2. Current Logging Format
> The current format of Spark logs is plain text, which can be challenging to 
> parse and analyze efficiently. An example of the current log format is as 
> follows:
> {code:java}
> 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor 
> 289 is alive or not.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: ..
> {code}
> h2. Proposed Structured Logging Format
> The proposed change involves structuring the logs in JSON format, which 
> organizes the log information into easily identifiable fields. Here is how 
> the new structured log format would look:
> {code:java}
> {
>    "ts":"23/11/29 17:53:44",
>    "level":"ERROR",
>    "msg":"Fail to know the executor 289 is alive or not",
>    "context":{
>        "executor_id":"289"
>    },
>    "exception":{
>       "class":"org.apache.spark.SparkException",
>       "msg":"Exception thrown in awaitResult",
>       "stackTrace":"..."
>    },
>    "source":"BlockManagerMasterEndpoint"
> } {code}
> This format will enable users to upload and directly query 
> driver/executor/master/worker log files using Spark SQL for more effective 
> problem-solving and analysis, such as tracking executor losses or identifying 
> faulty tasks:
> {code:java}
> spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs")
> /* To get all the executor lost logs */
> SELECT * FROM logs WHERE contains(message, 'Lost executor');
> /* To get all the distributed logs about executor 289 */
> SELECT * FROM logs WHERE executor_id = 289;
> /* To get all the errors on host 100.116.29.4 */
> SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR";
> {code}
>  
> SPIP doc: 
> [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48112) Expose session in SparkConnectPlanner to plugin developers

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48112:
---
Labels: pull-request-available  (was: )

> Expose session in SparkConnectPlanner to plugin developers
> --
>
> Key: SPARK-48112
> URL: https://issues.apache.org/jira/browse/SPARK-48112
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48112) Expose session in SparkConnectPlanner to plugin developers

2024-05-03 Thread Tom van Bussel (Jira)
Tom van Bussel created SPARK-48112:
--

 Summary: Expose session in SparkConnectPlanner to plugin developers
 Key: SPARK-48112
 URL: https://issues.apache.org/jira/browse/SPARK-48112
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Tom van Bussel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47240:
--

Assignee: Apache Spark

> SPIP: Structured Logging Framework for Apache Spark
> ---
>
> Key: SPARK-47240
> URL: https://issues.apache.org/jira/browse/SPARK-47240
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> This proposal aims to enhance Apache Spark's logging system by implementing 
> structured logging. This transition will change the format of the default log 
> files from plain text to JSON, making them more accessible and analyzable. 
> The new logs will include crucial identifiers such as worker, executor, 
> query, job, stage, and task IDs, thereby making the logs more informative and 
> facilitating easier search and analysis.
> h2. Current Logging Format
> The current format of Spark logs is plain text, which can be challenging to 
> parse and analyze efficiently. An example of the current log format is as 
> follows:
> {code:java}
> 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor 
> 289 is alive or not.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: ..
> {code}
> h2. Proposed Structured Logging Format
> The proposed change involves structuring the logs in JSON format, which 
> organizes the log information into easily identifiable fields. Here is how 
> the new structured log format would look:
> {code:java}
> {
>    "ts":"23/11/29 17:53:44",
>    "level":"ERROR",
>    "msg":"Fail to know the executor 289 is alive or not",
>    "context":{
>        "executor_id":"289"
>    },
>    "exception":{
>       "class":"org.apache.spark.SparkException",
>       "msg":"Exception thrown in awaitResult",
>       "stackTrace":"..."
>    },
>    "source":"BlockManagerMasterEndpoint"
> } {code}
> This format will enable users to upload and directly query 
> driver/executor/master/worker log files using Spark SQL for more effective 
> problem-solving and analysis, such as tracking executor losses or identifying 
> faulty tasks:
> {code:java}
> spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs")
> /* To get all the executor lost logs */
> SELECT * FROM logs WHERE contains(message, 'Lost executor');
> /* To get all the distributed logs about executor 289 */
> SELECT * FROM logs WHERE executor_id = 289;
> /* To get all the errors on host 100.116.29.4 */
> SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR";
> {code}
>  
> SPIP doc: 
> [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47240:
--

Assignee: (was: Apache Spark)

> SPIP: Structured Logging Framework for Apache Spark
> ---
>
> Key: SPARK-47240
> URL: https://issues.apache.org/jira/browse/SPARK-47240
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> This proposal aims to enhance Apache Spark's logging system by implementing 
> structured logging. This transition will change the format of the default log 
> files from plain text to JSON, making them more accessible and analyzable. 
> The new logs will include crucial identifiers such as worker, executor, 
> query, job, stage, and task IDs, thereby making the logs more informative and 
> facilitating easier search and analysis.
> h2. Current Logging Format
> The current format of Spark logs is plain text, which can be challenging to 
> parse and analyze efficiently. An example of the current log format is as 
> follows:
> {code:java}
> 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor 
> 289 is alive or not.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: ..
> {code}
> h2. Proposed Structured Logging Format
> The proposed change involves structuring the logs in JSON format, which 
> organizes the log information into easily identifiable fields. Here is how 
> the new structured log format would look:
> {code:java}
> {
>    "ts":"23/11/29 17:53:44",
>    "level":"ERROR",
>    "msg":"Fail to know the executor 289 is alive or not",
>    "context":{
>        "executor_id":"289"
>    },
>    "exception":{
>       "class":"org.apache.spark.SparkException",
>       "msg":"Exception thrown in awaitResult",
>       "stackTrace":"..."
>    },
>    "source":"BlockManagerMasterEndpoint"
> } {code}
> This format will enable users to upload and directly query 
> driver/executor/master/worker log files using Spark SQL for more effective 
> problem-solving and analysis, such as tracking executor losses or identifying 
> faulty tasks:
> {code:java}
> spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs")
> /* To get all the executor lost logs */
> SELECT * FROM logs WHERE contains(message, 'Lost executor');
> /* To get all the distributed logs about executor 289 */
> SELECT * FROM logs WHERE executor_id = 289;
> /* To get all the errors on host 100.116.29.4 */
> SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR";
> {code}
>  
> SPIP doc: 
> [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48111) Move tpcds-1g and docker-integration-tests to daily scheduled jobs

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48111.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46361
[https://github.com/apache/spark/pull/46361]

> Move tpcds-1g and docker-integration-tests to daily scheduled jobs
> --
>
> Key: SPARK-48111
> URL: https://issues.apache.org/jira/browse/SPARK-48111
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48111) Move tpcds-1g and docker-integration-tests to daily scheduled jobs

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48111:
---
Labels: pull-request-available  (was: )

> Move tpcds-1g and docker-integration-tests to daily scheduled jobs
> --
>
> Key: SPARK-48111
> URL: https://issues.apache.org/jira/browse/SPARK-48111
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48110) Remove all Maven compilation build

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48110.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46360
[https://github.com/apache/spark/pull/46360]

> Remove all Maven compilation build
> --
>
> Key: SPARK-48110
> URL: https://issues.apache.org/jira/browse/SPARK-48110
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> It is being tested in scheduled build so maybe we can just remove them all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48111) Move tpcds-1g and docker-integration-tests to daily scheduled jobs

2024-05-03 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48111:


 Summary: Move tpcds-1g and docker-integration-tests to daily 
scheduled jobs
 Key: SPARK-48111
 URL: https://issues.apache.org/jira/browse/SPARK-48111
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48094:
--
Description: 
h2. ASF INFRA POLICY
- https://infra.apache.org/github-actions-policy.html

h2. MONITORING
[https://infra-reports.apache.org/#ghactions=spark=168]

 !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 

h2. TARGET
* All workflows MUST have a job concurrency level less than or equal to 20. 
This means a workflow cannot have more than 20 jobs running at the same time 
across all matrices.
* All workflows SHOULD have a job concurrency level less than or equal to 15. 
Just because 20 is the max, doesn't mean you should strive for 20.
* The average number of minutes a project uses per calendar week MUST NOT 
exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours).
* The average number of minutes a project uses in any consecutive five-day 
period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, 
or 3,600 hours).

h2. DEADLINE
bq. 17th of May, 2024

Since the deadline is 17th of May, 2024, I set this as the highest priority, 
`Blocker`.



  was:
h2. ASF INFRA POLICY
- https://infra.apache.org/github-actions-policy.html

h2. MONITORING
[https://infra-reports.apache.org/#ghactions=spark=168]

 !Screenshot 2024-05-02 at 20.59.18.png|width=100%! 

h2. TARGET
* All workflows MUST have a job concurrency level less than or equal to 20. 
This means a workflow cannot have more than 20 jobs running at the same time 
across all matrices.
* All workflows SHOULD have a job concurrency level less than or equal to 15. 
Just because 20 is the max, doesn't mean you should strive for 20.
* The average number of minutes a project uses per calendar week MUST NOT 
exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours).
* The average number of minutes a project uses in any consecutive five-day 
period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, 
or 3,600 hours).

h2. DEADLINE
bq. 17th of May, 2024

Since the deadline is 17th of May, 2024, I set this as the highest priority, 
`Blocker`.




> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> [https://infra-reports.apache.org/#ghactions=spark=168]
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48094:
--
Attachment: Screenshot 2024-05-02 at 23.56.05.png

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> [https://infra-reports.apache.org/#ghactions=spark=168]
>  !Screenshot 2024-05-02 at 20.59.18.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48094:
--
Attachment: (was: Screenshot 2024-05-02 at 20.59.18.png)

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> [https://infra-reports.apache.org/#ghactions=spark=168]
>  !Screenshot 2024-05-02 at 20.59.18.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48107) Exclude tests from Python distribution

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48107.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46354
[https://github.com/apache/spark/pull/46354]

> Exclude tests from Python distribution
> --
>
> Key: SPARK-48107
> URL: https://issues.apache.org/jira/browse/SPARK-48107
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48107) Exclude tests from Python distribution

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48107:
-

Assignee: Nicholas Chammas

> Exclude tests from Python distribution
> --
>
> Key: SPARK-48107
> URL: https://issues.apache.org/jira/browse/SPARK-48107
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48106) Use `Python 3.11` in `pyspark` tests of `build_and_test.yml`

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48106.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46353
[https://github.com/apache/spark/pull/46353]

> Use `Python 3.11` in `pyspark` tests of `build_and_test.yml`
> 
>
> Key: SPARK-48106
> URL: https://issues.apache.org/jira/browse/SPARK-48106
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> - https://docs.python.org/3/whatsnew/3.11.html#summary-release-highlights
> bq. Python 3.11 is between 10-60% faster than Python 3.10. On average, we 
> measured a 1.25x speedup on the standard benchmark suite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48103) Promote ` KubernetesDriverBuilder` to `DeveloperApi`

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48103.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46357
[https://github.com/apache/spark/pull/46357]

> Promote ` KubernetesDriverBuilder` to `DeveloperApi`
> 
>
> Key: SPARK-48103
> URL: https://issues.apache.org/jira/browse/SPARK-48103
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48109) Enable `k8s-integration-tests` only for `kubernetes` module change

2024-05-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48109.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46356
[https://github.com/apache/spark/pull/46356]

> Enable `k8s-integration-tests` only for `kubernetes` module change
> --
>
> Key: SPARK-48109
> URL: https://issues.apache.org/jira/browse/SPARK-48109
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Although there is a chance of missing the related core module change, daily 
> CI test coverage will reveal that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0

2024-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48088:


Assignee: Hyukjin Kwon

> Skip tests being failed in client 3.5 <> server 4.0
> ---
>
> Key: SPARK-48088
> URL: https://issues.apache.org/jira/browse/SPARK-48088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should skip, and set the CI first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48088) Skip tests being failed in client 3.5 <> server 4.0

2024-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48088.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46334
[https://github.com/apache/spark/pull/46334]

> Skip tests being failed in client 3.5 <> server 4.0
> ---
>
> Key: SPARK-48088
> URL: https://issues.apache.org/jira/browse/SPARK-48088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> We should skip, and set the CI first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48054) Backward compatibility test for Spark Connect

2024-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48054:


Assignee: Hyukjin Kwon

> Backward compatibility test for Spark Connect
> -
>
> Key: SPARK-48054
> URL: https://issues.apache.org/jira/browse/SPARK-48054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Now that we can run the Spark Connect server separately in CI, we can run the 
> Spark Connect server of lower version, and higher version of client, and the 
> opposite as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48054) Backward compatibility test for Spark Connect

2024-05-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48054.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46298
[https://github.com/apache/spark/pull/46298]

> Backward compatibility test for Spark Connect
> -
>
> Key: SPARK-48054
> URL: https://issues.apache.org/jira/browse/SPARK-48054
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Now that we can run the Spark Connect server separately in CI, we can run the 
> Spark Connect server of lower version, and higher version of client, and the 
> opposite as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org