[jira] [Updated] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46747:
---
Labels: pull-request-available  (was: )

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Priority: Critical
>  Labels: pull-request-available
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46905) Add dedicated class to keep column definition instead of StructField in Create/ReplaceTable command

2024-01-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46905.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44935
[https://github.com/apache/spark/pull/44935]

> Add dedicated class to keep column definition instead of StructField in 
> Create/ReplaceTable command
> ---
>
> Key: SPARK-46905
> URL: https://issues.apache.org/jira/browse/SPARK-46905
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46905) Add dedicated class to keep column definition instead of StructField in Create/ReplaceTable command

2024-01-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46905:
---

Assignee: Wenchen Fan

> Add dedicated class to keep column definition instead of StructField in 
> Create/ReplaceTable command
> ---
>
> Key: SPARK-46905
> URL: https://issues.apache.org/jira/browse/SPARK-46905
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46893.
---
Fix Version/s: 3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44933
[https://github.com/apache/spark/pull/44933]

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Assignee: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.1, 4.0.0
>
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Assigned] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46893:
-

Assignee: Willi Raschkowski

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Assignee: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Commented] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812165#comment-17812165
 ] 

Dongjoon Hyun commented on SPARK-46893:
---

Thank you for pinging me, [~rshkv].

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Updated] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46876:
---
Labels: pull-request-available  (was: )

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>  Labels: pull-request-available
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46914) Shorten app name in the summary table on the History Page

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46914.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44944
[https://github.com/apache/spark/pull/44944]

> Shorten app name in the summary table on the History Page 
> --
>
> Key: SPARK-46914
> URL: https://issues.apache.org/jira/browse/SPARK-46914
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46914) Shorten app name in the summary table on the History Page

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46914:
-

Assignee: Kent Yao

> Shorten app name in the summary table on the History Page 
> --
>
> Key: SPARK-46914
> URL: https://issues.apache.org/jira/browse/SPARK-46914
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46916) Clean up the imports in pyspark.pandas.tests.indexes.*

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46916.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44945
[https://github.com/apache/spark/pull/44945]

> Clean up the imports in pyspark.pandas.tests.indexes.*
> --
>
> Key: SPARK-46916
> URL: https://issues.apache.org/jira/browse/SPARK-46916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46916) Clean up the imports in pyspark.pandas.tests.indexes.*

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46916:
---
Labels: pull-request-available  (was: )

> Clean up the imports in pyspark.pandas.tests.indexes.*
> --
>
> Key: SPARK-46916
> URL: https://issues.apache.org/jira/browse/SPARK-46916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46916) Clean up the imports in pyspark.pandas.tests.indexes.*

2024-01-29 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-46916:
--
Summary: Clean up the imports in pyspark.pandas.tests.indexes.*  (was: 
[SPARK-46896][PS][TESTS] Clean up the imports in pyspark.pandas.tests.indexes.*)

> Clean up the imports in pyspark.pandas.tests.indexes.*
> --
>
> Key: SPARK-46916
> URL: https://issues.apache.org/jira/browse/SPARK-46916
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46914) Shorten app name in the summary table on the History Page

2024-01-29 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46914:
-
Priority: Minor  (was: Major)

> Shorten app name in the summary table on the History Page 
> --
>
> Key: SPARK-46914
> URL: https://issues.apache.org/jira/browse/SPARK-46914
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

2024-01-29 Thread Kent Yao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812135#comment-17812135
 ] 

Kent Yao commented on SPARK-46747:
--

It would be better if you could provide the stats of # of shared locks before 
and after.

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>Reporter: Bala Bellam
>Priority: Critical
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
> existence in the database by overriding the default 
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in 
> the PostgreSQL installations where there are many partitions under a table 
> that's being written to. Hence resorting to the default JdbcDialect which 
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the 
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
> scenarios, especially with partitioned tables or tables with a lot of data, 
> as it may take shared locks on all partitions or involve more planner and 
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
> any data due to the always false WHERE condition. This makes it a lighter 
> operation, as it typically only involves checking the table's metadata to 
> validate the table's existence without taking locks on the table's data or 
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 
> 1=0 would be a better choice if we're strictly looking to check for a table's 
> existence and want to avoid potentially heavier operations like taking shared 
> locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46914) Shorten app name in the summary table on the History Page

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46914:
---
Labels: pull-request-available  (was: )

> Shorten app name in the summary table on the History Page 
> --
>
> Key: SPARK-46914
> URL: https://issues.apache.org/jira/browse/SPARK-46914
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46912) Spark-submit in cluster mode with standalone cluster uses wrong JAVA_HOME path

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46912:
---
Labels: pull-request-available  (was: )

> Spark-submit in cluster mode with standalone cluster uses wrong JAVA_HOME path
> --
>
> Key: SPARK-46912
> URL: https://issues.apache.org/jira/browse/SPARK-46912
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.5.0
>Reporter: Danh Pham
>Priority: Major
>  Labels: pull-request-available
>
> When run spark submit to a standalone cluster using cluster mode, the worker 
> machine will use the JAVA_HOME value from remote machine instead of from 
> worker machine.
> To reproduce:
>  * Create a standalone cluster using docker compose, set JAVA_HOME in each 
> worker different from local machine.
>  * Run spark-submit, deploy-mode cluster
>  * Monitor the log from worker, the driver will print out: DriverRunner: 
> Launch Command: "" "-cp" ...
> Reason:
> When Master create a new driver in receiveAndReply method, it uses the 
> environment variables from submitter to build the driver description command. 
> After that, when launch the driver, a new local (of worker) is built but it 
> still use environment variable from driver description (which came from 
> submitter). The result is the building java command will use the submitter 
> java home path instead of worker path.
> Suggestion:
> Replace JAVA_HOME and SPARK_HOME in buildLocalCommand method of 
> org.apache.spark.deploy.worker.CommandUtils by worker value



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46915) Simplify `UnaryMinus` and align error class

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46915:
---
Labels: pull-request-available  (was: )

> Simplify `UnaryMinus` and align error class
> ---
>
> Key: SPARK-46915
> URL: https://issues.apache.org/jira/browse/SPARK-46915
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46915) Simplify `UnaryMinus` and align error class

2024-01-29 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46915:
---

 Summary: Simplify `UnaryMinus` and align error class
 Key: SPARK-46915
 URL: https://issues.apache.org/jira/browse/SPARK-46915
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46914) Shorten app name in the summary table on the History Page

2024-01-29 Thread Kent Yao (Jira)
Kent Yao created SPARK-46914:


 Summary: Shorten app name in the summary table on the History Page 
 Key: SPARK-46914
 URL: https://issues.apache.org/jira/browse/SPARK-46914
 Project: Spark
  Issue Type: Improvement
  Components: UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812098#comment-17812098
 ] 

Jie Han edited comment on SPARK-46876 at 1/30/24 3:01 AM:
--

{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. I've learnt that apache/commons-csv 
does trim for every column instead of whole line before parsing and trim is an 
option.}}


was (Author: JIRAUSER285788):
{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. I've learnt that  apache/commons-csv 
does trim for every column instead of whole line before parsing and trim is an 
option.}}

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812098#comment-17812098
 ] 

Jie Han edited comment on SPARK-46876 at 1/30/24 3:00 AM:
--

{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. I've learnt that  apache/commons-csv 
does trim for every column instead of whole line before parsing and trim is an 
option.}}


was (Author: JIRAUSER285788):
{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. I've learnt that  apache/commons-csv 
does trim for every column instead of whole line before parsing.}}

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812098#comment-17812098
 ] 

Jie Han edited comment on SPARK-46876 at 1/30/24 3:00 AM:
--

{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. I've learnt that  apache/commons-csv 
does trim for every column instead of whole line before parsing.}}


was (Author: JIRAUSER285788):
{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. }}

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46912) Spark-submit in cluster mode with standalone cluster uses wrong JAVA_HOME path

2024-01-29 Thread Danh Pham (Jira)
Danh Pham created SPARK-46912:
-

 Summary: Spark-submit in cluster mode with standalone cluster uses 
wrong JAVA_HOME path
 Key: SPARK-46912
 URL: https://issues.apache.org/jira/browse/SPARK-46912
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Submit
Affects Versions: 3.5.0
Reporter: Danh Pham


When run spark submit to a standalone cluster using cluster mode, the worker 
machine will use the JAVA_HOME value from remote machine instead of from worker 
machine.

To reproduce:
 * Create a standalone cluster using docker compose, set JAVA_HOME in each 
worker different from local machine.
 * Run spark-submit, deploy-mode cluster
 * Monitor the log from worker, the driver will print out: DriverRunner: Launch 
Command: "" "-cp" ...

Reason:

When Master create a new driver in receiveAndReply method, it uses the 
environment variables from submitter to build the driver description command. 
After that, when launch the driver, a new local (of worker) is built but it 
still use environment variable from driver description (which came from 
submitter). The result is the building java command will use the submitter java 
home path instead of worker path.

Suggestion:

Replace JAVA_HOME and SPARK_HOME in buildLocalCommand method of 
org.apache.spark.deploy.worker.CommandUtils by worker value



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812098#comment-17812098
 ] 

Jie Han edited comment on SPARK-46876 at 1/30/24 2:26 AM:
--

{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which only 
contains characters those <= ' '. I doubt that if it's neccessary to do this, 
because they may be exactly data itself. }}


was (Author: JIRAUSER285788):
{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which contains 
characters those <= ' '. I doubt that if it's neccessary to do this, because 
they may be exactly data itself. }}

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46736) Retain empty protobuf message in schema for rpotobuf connector

2024-01-29 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46736.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44643
[https://github.com/apache/spark/pull/44643]

> Retain empty protobuf message in schema for rpotobuf connector
> --
>
> Key: SPARK-46736
> URL: https://issues.apache.org/jira/browse/SPARK-46736
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf
>Affects Versions: 4.0.0
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Since Spark doesn't allow empty StructType, empty proto message type as field 
> will be dropped by default. Introduce an option to allow retaining an empty 
> message field by inserting a dummy column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46736) Retain empty protobuf message in schema for rpotobuf connector

2024-01-29 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-46736:


Assignee: Chaoqin Li

> Retain empty protobuf message in schema for rpotobuf connector
> --
>
> Key: SPARK-46736
> URL: https://issues.apache.org/jira/browse/SPARK-46736
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf
>Affects Versions: 4.0.0
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Since Spark doesn't allow empty StructType, empty proto message type as field 
> will be dropped by default. Introduce an option to allow retaining an empty 
> message field by inserting a dummy column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812098#comment-17812098
 ] 

Jie Han commented on SPARK-46876:
-

{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which contains 
characters those <= ' '. I doubt that if it's neccessary to do this, because 
they may be the exactly data itself. }}

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46876) Data is silently lost in Tab separated CSV with empty (whitespace) rows

2024-01-29 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812098#comment-17812098
 ] 

Jie Han edited comment on SPARK-46876 at 1/30/24 1:24 AM:
--

{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which contains 
characters those <= ' '. I doubt that if it's neccessary to do this, because 
they may be exactly data itself. }}


was (Author: JIRAUSER285788):
{{The reason is that before parsing the csv lines spark calls 
`CSVExprUtils.filterCommentAndEmpty` to filter `empty` lines which contains 
characters those <= ' '. I doubt that if it's neccessary to do this, because 
they may be the exactly data itself. }}

> Data is silently lost in Tab separated CSV with empty (whitespace) rows
> ---
>
> Key: SPARK-46876
> URL: https://issues.apache.org/jira/browse/SPARK-46876
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: Martin Rueckl
>Priority: Critical
>
> When reading a tab separated file that contains lines that only contain tabs 
> (i.e. empty strings as values of the columns for that row), then these rows 
> will silently be skipped (as empty lines) and the resulting dataframe will 
> have less rows than expected.
> This behavior is inconsistent with the behavior for e.g. semicolon separated 
> files, where the resulting dataframe will have a row with only empty string 
> values.
> A minimal reproducible example would look like:
> A minimal reproducible example: A file containing this
> {code:java}
> a\tb\tc\r\n
> \t\t\r\n
> 1\t2\t3{code}
> will create a dataframe with one row (a=1,b=2,c=3)
> whereas this
> {code:java}
> a;b;c\r\n
> ;;\r\n
> 1;2;3{code}
> will read as two rows (first row contains empty strings)
> I used the following pyspark command to read the dataframes
> {code:java}
>  spark.read.option("header","true").option("sep","\t").csv(" file>").collect()
> spark.read.option("header","true").option("sep",";").csv(" file>").collect()
> {code}
> I ran into this particularly on databricks (I assume they use the same 
> reader), but [this stack overflow 
> post|https://stackoverflow.com/questions/47823858/replacing-empty-lines-with-characters-when-reading-csv-using-spark#comment137288546_47823858]
>  indicates, that this is an old issue that may have been taken over from 
> databricks when their csv reader was adopted in SPARK-12420
> I recommend to at least add a respective test case to the CSV reader.
>  
> Why is this behaviour a problem:
>  * It violates some of the core assumptions
>  ** a properly configured roundtrip via csv write/read should result in the 
> same set of rows
>  ** changing the csv separator (when everything is properly esacped) should 
> have no effect
> Potential resolutions:
>  * When the configured delimiter consists of only whitespace
>  ** deactivate the "skip empty line feature"
>  ** or skip only lines that are completely empty (only a (carriage return) 
> newline)
>  * Change the skip empty line feature to only skip if the line is completely 
> empty (only contains a newlin)
>  ** this may break some user code that relies on the current behaviour



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46910) Eliminate JDK Requirement in PySpark Installation

2024-01-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46910.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44940
[https://github.com/apache/spark/pull/44940]

> Eliminate JDK Requirement in PySpark Installation
> -
>
> Key: SPARK-46910
> URL: https://issues.apache.org/jira/browse/SPARK-46910
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> PySpark requires users to have the correct JDK version (JDK 8+ for Spark<4; 
> JDK 17+ for Spark>=4) installed locally.
> We can make the Spark installation script install the JDK, so users don’t 
> need to do this step manually.
> h1. Details
>  # When the entry point for a Spark class is invoked, the spark-class script 
> checks if Java is installed in the user environment.
>  # If Java is not installed, the user is prompted to select whether they want 
> to install JDK 17.
>  # If the user selects yes, JDK 17 is installed (using the [install-jdk 
> library|https://pypi.org/project/install-jdk/]) and JAVA_HOME variable and 
> RUNNER are set appropriately. The Spark build will now work!
>  # If the user selects no, we provide them a brief description of how to 
> install JDK manually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46910) Eliminate JDK Requirement in PySpark Installation

2024-01-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46910:


Assignee: Amanda Liu

> Eliminate JDK Requirement in PySpark Installation
> -
>
> Key: SPARK-46910
> URL: https://issues.apache.org/jira/browse/SPARK-46910
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Minor
>  Labels: pull-request-available
>
> PySpark requires users to have the correct JDK version (JDK 8+ for Spark<4; 
> JDK 17+ for Spark>=4) installed locally.
> We can make the Spark installation script install the JDK, so users don’t 
> need to do this step manually.
> h1. Details
>  # When the entry point for a Spark class is invoked, the spark-class script 
> checks if Java is installed in the user environment.
>  # If Java is not installed, the user is prompted to select whether they want 
> to install JDK 17.
>  # If the user selects yes, JDK 17 is installed (using the [install-jdk 
> library|https://pypi.org/project/install-jdk/]) and JAVA_HOME variable and 
> RUNNER are set appropriately. The Spark build will now work!
>  # If the user selects no, we provide them a brief description of how to 
> install JDK manually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46911) Add deleteIfExists operator to StatefulProcessorHandle

2024-01-29 Thread Eric Marnadi (Jira)
Eric Marnadi created SPARK-46911:


 Summary: Add deleteIfExists operator to StatefulProcessorHandle
 Key: SPARK-46911
 URL: https://issues.apache.org/jira/browse/SPARK-46911
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Eric Marnadi


Adding the {{deleteIfExists}} method to the {{StatefulProcessorHandle}} in 
order to remove state variables from the State Store



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46910) Eliminate JDK Requirement in PySpark Installation

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46910:
---
Labels: pull-request-available  (was: )

> Eliminate JDK Requirement in PySpark Installation
> -
>
> Key: SPARK-46910
> URL: https://issues.apache.org/jira/browse/SPARK-46910
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Priority: Minor
>  Labels: pull-request-available
>
> PySpark requires users to have the correct JDK version (JDK 8+ for Spark<4; 
> JDK 17+ for Spark>=4) installed locally.
> We can make the Spark installation script install the JDK, so users don’t 
> need to do this step manually.
> h1. Details
>  # When the entry point for a Spark class is invoked, the spark-class script 
> checks if Java is installed in the user environment.
>  # If Java is not installed, the user is prompted to select whether they want 
> to install JDK 17.
>  # If the user selects yes, JDK 17 is installed (using the [install-jdk 
> library|https://pypi.org/project/install-jdk/]) and JAVA_HOME variable and 
> RUNNER are set appropriately. The Spark build will now work!
>  # If the user selects no, we provide them a brief description of how to 
> install JDK manually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46910) Eliminate JDK Requirement in PySpark Installation

2024-01-29 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-46910:
--

 Summary: Eliminate JDK Requirement in PySpark Installation
 Key: SPARK-46910
 URL: https://issues.apache.org/jira/browse/SPARK-46910
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


PySpark requires users to have the correct JDK version (JDK 8+ for Spark<4; JDK 
17+ for Spark>=4) installed locally.

We can make the Spark installation script install the JDK, so users don’t need 
to do this step manually.
h1. Details
 # When the entry point for a Spark class is invoked, the spark-class script 
checks if Java is installed in the user environment.

 # If Java is not installed, the user is prompted to select whether they want 
to install JDK 17.

 # If the user selects yes, JDK 17 is installed (using the [install-jdk 
library|https://pypi.org/project/install-jdk/]) and JAVA_HOME variable and 
RUNNER are set appropriately. The Spark build will now work!

 # If the user selects no, we provide them a brief description of how to 
install JDK manually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ https://issues.apache.org/jira/browse/SPARK-46890 ]


Daniel deleted comment on SPARK-46890:


was (Author: JIRAUSER285772):
I think this `tokenIndexArr` within Spark's `UnivocityParser` class has 
different values in the passing and failing cases:
{code:java}
// This index is used to reorder parsed tokens
private val tokenIndexArr =
  requiredSchema.map(f => 
java.lang.Integer.valueOf(dataSchema.indexOf(f))).toArray{code}
The presence of the default column metadata in the `requiredSchema` is causing 
the `dataSchema.indexOf` call to fail to match.

We can possibly fix this by just stripping the default value metadata from the 
`requiredSchema` before computing this mapping.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46890:
---
Labels: pull-request-available  (was: )

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812072#comment-17812072
 ] 

Daniel commented on SPARK-46890:


[~maxgekk] I created a bug fix here: 
[https://github.com/apache/spark/pull/44939]

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46907) Show driver log location in Spark History Server

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46907.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44936
[https://github.com/apache/spark/pull/44936]

> Show driver log location in Spark History Server
> 
>
> Key: SPARK-46907
> URL: https://issues.apache.org/jira/browse/SPARK-46907
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812059#comment-17812059
 ] 

Daniel edited comment on SPARK-46890 at 1/29/24 9:27 PM:
-

I think this `tokenIndexArr` within Spark's `UnivocityParser` class has 
different values in the passing and failing cases:
{code:java}
// This index is used to reorder parsed tokens
private val tokenIndexArr =
  requiredSchema.map(f => 
java.lang.Integer.valueOf(dataSchema.indexOf(f))).toArray{code}
The presence of the default column metadata in the `requiredSchema` is causing 
the `dataSchema.indexOf` call to fail to match.

We can possibly fix this by just stripping the default value metadata from the 
`requiredSchema` before computing this mapping.


was (Author: JIRAUSER285772):
I think this `tokenIndexArr` within Spark's `UnivocityParser` class has 
different values in the passing and failing cases:
{code:java}
// This index is used to reorder parsed tokens
private val tokenIndexArr =
  requiredSchema.map(f => 
java.lang.Integer.valueOf(dataSchema.indexOf(f))).toArray
 {code}

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812059#comment-17812059
 ] 

Daniel commented on SPARK-46890:


I think this `tokenIndexArr` within Spark's `UnivocityParser` class has 
different values in the passing and failing cases:
{code:java}
// This index is used to reorder parsed tokens
private val tokenIndexArr =
  requiredSchema.map(f => 
java.lang.Integer.valueOf(dataSchema.indexOf(f))).toArray
 {code}

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812058#comment-17812058
 ] 

Daniel commented on SPARK-46890:


The bug happens when the Univocity parser is converting the parsed column names 
to a result array of strings. This `columnsReordered` boolean is true when no 
column defaults are specified, but erroneously false otherwise:

!image-2024-01-29-13-22-05-326.png!

 

[1] 
https://github.com/apache/spark/blob/528ac8b3e8548a53d931007c36db3427c610f4da/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVHeaderChecker.scala#L127

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel updated SPARK-46890:
---
Attachment: image-2024-01-29-13-22-05-326.png

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
> Attachments: image-2024-01-29-13-22-05-326.png
>
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46687) Implement memory-profiler

2024-01-29 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46687.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44775
[https://github.com/apache/spark/pull/44775]

> Implement memory-profiler
> -
>
> Key: SPARK-46687
> URL: https://issues.apache.org/jira/browse/SPARK-46687
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812051#comment-17812051
 ] 

Daniel commented on SPARK-46890:


The exception comes from here: 
[https://github.com/apache/spark/blob/c468c3d5c685c5a5ecd7caf01f3004addce1f3b6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVHeaderChecker.scala#L91]

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812047#comment-17812047
 ] 

Daniel commented on SPARK-46890:


Thanks [~maxgekk] both of the above tests reproduce the bug now. I will debug 
it.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46908) Extend SELECT * support outside of select list

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46908:
---
Labels: SQL pull-request-available  (was: SQL)

> Extend SELECT * support outside of select list
> --
>
> Key: SPARK-46908
> URL: https://issues.apache.org/jira/browse/SPARK-46908
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>  Labels: SQL, pull-request-available
>
> Traditionally * is confined to thr select list and there to the top level of 
> expressions.
> Spark does, in an undocumented fashion support * in the SELECT list for 
> function argument list.
> Here we want to expand upon this capability by adding the WHERE clause 
> (Filter) as well as a couple of more scenarios such as row value constructors 
> and IN operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46909) Could not initialize class org.apache.spark.unsafe.array.ByteArrayMethods error in JDK 21

2024-01-29 Thread Johnny Sohn (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Sohn updated SPARK-46909:

Affects Version/s: 3.5.0

> Could not initialize class org.apache.spark.unsafe.array.ByteArrayMethods 
> error in JDK 21
> -
>
> Key: SPARK-46909
> URL: https://issues.apache.org/jira/browse/SPARK-46909
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0
>Reporter: Johnny Sohn
>Priority: Major
>
> Trying to run spark on JDK 21, and we're getting this exception
> {code:java}
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.spark.unsafe.array.ByteArrayMethods
>   at 
> org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
>   at 
> org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
>   at 
> org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
>   at 
> scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.memory.MemoryManager.(MemoryManager.scala:273)
>   at 
> org.apache.spark.memory.UnifiedMemoryManager.(UnifiedMemoryManager.scala:58)
>   at 
> org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
>   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
>   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
>   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
>   at org.apache.spark.SparkContext.(SparkContext.scala:464)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
>   at 
> Caused by: java.lang.ExceptionInInitializerError: Exception 
> java.lang.ExceptionInInitializerError [in thread "skir-0"]
>   at 
> org.apache.spark.unsafe.array.ByteArrayMethods.(ByteArrayMethods.java:56)
>   at 
> org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
>   at 
> org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
>   at 
> org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
>   at 
> scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.memory.MemoryManager.(MemoryManager.scala:273)
>   at 
> org.apache.spark.memory.UnifiedMemoryManager.(UnifiedMemoryManager.scala:58)
>   at 
> org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
>   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
>   at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
>   at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
>   at org.apache.spark.SparkContext.(SparkContext.scala:464)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
>   at  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46909) Could not initialize class org.apache.spark.unsafe.array.ByteArrayMethods error in JDK 21

2024-01-29 Thread Johnny Sohn (Jira)
Johnny Sohn created SPARK-46909:
---

 Summary: Could not initialize class 
org.apache.spark.unsafe.array.ByteArrayMethods error in JDK 21
 Key: SPARK-46909
 URL: https://issues.apache.org/jira/browse/SPARK-46909
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.1
Reporter: Johnny Sohn


Trying to run spark on JDK 21, and we're getting this exception
{code:java}
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.unsafe.array.ByteArrayMethods
at 
org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
at 
org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
at 
org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.memory.MemoryManager.(MemoryManager.scala:273)
at 
org.apache.spark.memory.UnifiedMemoryManager.(UnifiedMemoryManager.scala:58)
at 
org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
at org.apache.spark.SparkContext.(SparkContext.scala:464)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at 
Caused by: java.lang.ExceptionInInitializerError: Exception 
java.lang.ExceptionInInitializerError [in thread "skir-0"]
at 
org.apache.spark.unsafe.array.ByteArrayMethods.(ByteArrayMethods.java:56)
at 
org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
at 
org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
at 
org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.memory.MemoryManager.(MemoryManager.scala:273)
at 
org.apache.spark.memory.UnifiedMemoryManager.(UnifiedMemoryManager.scala:58)
at 
org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
at org.apache.spark.SparkContext.(SparkContext.scala:464)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812044#comment-17812044
 ] 

Max Gekk edited comment on SPARK-46890 at 1/29/24 8:29 PM:
---

[~dtenedor] Need to trigger the column pruning feature but your query 
{code:scala}
spark.table("Products"),{code}
doesn't do that.

See my example:

{code:sql}
spark-sql (default)> SELECT price FROM products;

{code}
It requests only one column.


was (Author: maxgekk):
[~dtenedor] Need to trigger the column pruning feature but your query 
spark.table("Products"),
doesn't do that.

See my example:
spark-sql (default)> SELECT price FROM products;
It requests only one column.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46908) Extend SELECT * support outside of select list

2024-01-29 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46908:


 Summary: Extend SELECT * support outside of select list
 Key: SPARK-46908
 URL: https://issues.apache.org/jira/browse/SPARK-46908
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


Traditionally * is confined to thr select list and there to the top level of 
expressions.
Spark does, in an undocumented fashion support * in the SELECT list for 
function argument list.
Here we want to expand upon this capability by adding the WHERE clause (Filter) 
as well as a couple of more scenarios such as row value constructors and IN 
operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812044#comment-17812044
 ] 

Max Gekk commented on SPARK-46890:
--

[~dtenedor] Need to trigger the column pruning feature but your query 
spark.table("Products"),
doesn't do that.

See my example:
spark-sql (default)> SELECT price FROM products;
It requests only one column.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812040#comment-17812040
 ] 

Daniel commented on SPARK-46890:


I tried the exact command in the bug description, and it doesn't cause any 
errors on the master branch:

 
{code:java}
withTable("Products") {
  spark.sql(
s"""
   |CREATE TABLE IF NOT EXISTS Products (
   |  product_id INT,
   |  name STRING,
   |  price FLOAT default 0.0,
   |  quantity INT default 0
   |)
   |USING CSV
   |OPTIONS (
   |  header 'true',
   |  inferSchema 'false',
   |  enforceSchema 'false',
   |  path "${testFile(productsFile)}"
   |)
   """.stripMargin)
  checkAnswer(
spark.table("Products"),
Seq(
  Row(1, "Apple", 0.50, 100),
  Row(2, "Banana", 0.25, 200),
  Row(3, "Orange", 0.75, 50)))
} {code}
With the "products.csv" file containing:
{code:java}
product_id,name,price,quantity
1,Apple,0.50,100
2,Banana,0.25,200
3,Orange,0.75,50 {code}
This unit test passes.

 

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812034#comment-17812034
 ] 

Daniel commented on SPARK-46890:


This unit test does not seem to reproduce the problem:

 
{code:java}
test("SPARK-46890: CSV fails on a column with default and without enforcing 
schema") {
  withTable("CarsTable") {
spark.sql(
  s"""
 |CREATE TABLE CarsTable(
 |  year INT,
 |  make STRING,
 |  model STRING,
 |  comment STRING DEFAULT '',
 |  blank STRING DEFAULT '')
 |USING csv
 |OPTIONS (
 |  header "true",
 |  inferSchema "false",
 |  enforceSchema "false",
 |  path "${testFile(carsFile)}"
 |)
 """.stripMargin)
checkAnswer(
  spark.table("CarsTable"),
  Seq(
Row(2012, "Tesla", "S", "No comment", null),
Row(1997, "Ford", "E350", "Go get one now they are going fast", null),
Row(2015, "Chevy", "Volt", "", "")
  ))
  }
} {code}
With the "cars.csv" file containing:

 
{code:java}

year,make,model,comment,blank
"2012","Tesla","S","No comment",

1997,Ford,E350,"Go get one now they are going fast",
2015,Chevy,Volt

 {code}
Will look further.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812034#comment-17812034
 ] 

Daniel edited comment on SPARK-46890 at 1/29/24 7:31 PM:
-

This unit test does not seem to reproduce the problem:
{code:java}
test("SPARK-46890: CSV fails on a column with default and without enforcing 
schema") {
  withTable("CarsTable") {
spark.sql(
  s"""
 |CREATE TABLE CarsTable(
 |  year INT,
 |  make STRING,
 |  model STRING,
 |  comment STRING DEFAULT '',
 |  blank STRING DEFAULT '')
 |USING csv
 |OPTIONS (
 |  header "true",
 |  inferSchema "false",
 |  enforceSchema "false",
 |  path "${testFile(carsFile)}"
 |)
 """.stripMargin)
checkAnswer(
  spark.table("CarsTable"),
  Seq(
Row(2012, "Tesla", "S", "No comment", null),
Row(1997, "Ford", "E350", "Go get one now they are going fast", null),
Row(2015, "Chevy", "Volt", "", "")
  ))
  }
} {code}
With the "cars.csv" file containing:
{code:java}

year,make,model,comment,blank
"2012","Tesla","S","No comment",

1997,Ford,E350,"Go get one now they are going fast",
2015,Chevy,Volt

 {code}
Will look further.


was (Author: JIRAUSER285772):
This unit test does not seem to reproduce the problem:

 
{code:java}
test("SPARK-46890: CSV fails on a column with default and without enforcing 
schema") {
  withTable("CarsTable") {
spark.sql(
  s"""
 |CREATE TABLE CarsTable(
 |  year INT,
 |  make STRING,
 |  model STRING,
 |  comment STRING DEFAULT '',
 |  blank STRING DEFAULT '')
 |USING csv
 |OPTIONS (
 |  header "true",
 |  inferSchema "false",
 |  enforceSchema "false",
 |  path "${testFile(carsFile)}"
 |)
 """.stripMargin)
checkAnswer(
  spark.table("CarsTable"),
  Seq(
Row(2012, "Tesla", "S", "No comment", null),
Row(1997, "Ford", "E350", "Go get one now they are going fast", null),
Row(2015, "Chevy", "Volt", "", "")
  ))
  }
} {code}
With the "cars.csv" file containing:

 
{code:java}

year,make,model,comment,blank
"2012","Tesla","S","No comment",

1997,Ford,E350,"Go get one now they are going fast",
2015,Chevy,Volt

 {code}
Will look further.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46905) Add dedicated class to keep column definition instead of StructField in Create/ReplaceTable command

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46905:
---
Labels: pull-request-available  (was: )

> Add dedicated class to keep column definition instead of StructField in 
> Create/ReplaceTable command
> ---
>
> Key: SPARK-46905
> URL: https://issues.apache.org/jira/browse/SPARK-46905
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46907) Show driver log location in Spark History Server

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46907:
-

Assignee: Dongjoon Hyun

> Show driver log location in Spark History Server
> 
>
> Key: SPARK-46907
> URL: https://issues.apache.org/jira/browse/SPARK-46907
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46907) Show driver log location in Spark History Server

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46907:
---
Labels: pull-request-available  (was: )

> Show driver log location in Spark History Server
> 
>
> Key: SPARK-46907
> URL: https://issues.apache.org/jira/browse/SPARK-46907
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46907) Show driver log location in Spark History Server

2024-01-29 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46907:
-

 Summary: Show driver log location in Spark History Server
 Key: SPARK-46907
 URL: https://issues.apache.org/jira/browse/SPARK-46907
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46890) CSV fails on a column with default and without enforcing schema

2024-01-29 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812002#comment-17812002
 ] 

Daniel commented on SPARK-46890:


Thanks [~maxgekk] for writing down the details here. It looks like this feature 
did not take into account the `enforceSchema` option properly. I can take a 
look.

> CSV fails on a column with default and without enforcing schema
> ---
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
>  - a column has an default +
>  - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
>   product_id INT,
>   name STRING,
>   price FLOAT default 0.0,
>   quantity INT default 0
> )
> USING CSV
> OPTIONS (
>   header 'true',
>   inferSchema 'false',
>   enforceSchema 'false',
>   path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not 
> equal to number of fields in the schema:
>  Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46905) Add dedicated class to keep column definition instead of StructField in Create/ReplaceTable command

2024-01-29 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-46905:
---

 Summary: Add dedicated class to keep column definition instead of 
StructField in Create/ReplaceTable command
 Key: SPARK-46905
 URL: https://issues.apache.org/jira/browse/SPARK-46905
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46904) Fix wrong display of History UI summary

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46904.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44934
[https://github.com/apache/spark/pull/44934]

> Fix wrong display of  History UI  summary
> -
>
> Key: SPARK-46904
> URL: https://issues.apache.org/jira/browse/SPARK-46904
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46904) Fix wrong display of History UI summary

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46904:
-

Assignee: Kent Yao

> Fix wrong display of  History UI  summary
> -
>
> Key: SPARK-46904
> URL: https://issues.apache.org/jira/browse/SPARK-46904
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46904) Fix wrong display of History UI summary

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46904:
--
Parent: SPARK-46001
Issue Type: Sub-task  (was: Bug)

> Fix wrong display of  History UI  summary
> -
>
> Key: SPARK-46904
> URL: https://issues.apache.org/jira/browse/SPARK-46904
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46810) Clarify error class terminology

2024-01-29 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811923#comment-17811923
 ] 

Nicholas Chammas commented on SPARK-46810:
--

I think Option 3 is a good compromise that lets us continue calling 
{{INCOMPLETE_TYPE_DEFINITION}} an "error class", which perhaps would be the 
least disruptive to Spark developers.

However, for the record, the SQL standard only seems to use the term "class" in 
the context of the 5-character SQLSTATE. Otherwise, the standard uses the term 
"condition" or "exception condition".

I don't have a copy of the SQL 2016 standard handy. It's not available on ISO's 
website for sale, actually. The only option appears to be to purchase [the SQL 
2023 standard for ~$220|https://www.iso.org/standard/76583.html].

However, there is a copy of the [SQL 1992 standard available 
publicly|https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt]. 

Table 23 on page 619 is relevant:

{code}
 Table_23-SQLSTATE_class_and_subclass_values

 _Condition__Class_Subcondition___Subclass

| ambiguous cursor name| 3C  | (no subclass)| 000  |
|  | |  |  |
|  | |  |  |
| cardinality violation| 21  | (no subclass)| 000  |
|  | |  |  |
| connection exception | 08  | (no subclass)| 000  |
|  | |  |  |
|  | | connection does not  | 003  |
   exist
|  | | connection failure   | 006  |
|  | |  |  |
|  | | connection name in use   | 002  |
|  | |  |  |
|  | | SQL-client unable to | 001  |
   establish SQL-connection
...
{code}

I think this maps closest to Option 1, but again if we want to go with Option 3 
I think that's reasonable too. But in the case of Option 3 we should then 
retire [our use of the term "error 
condition"|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html] so 
that we don't use multiple terms to refer to the same thing.

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: 

[jira] [Assigned] (SPARK-46831) Extend StringType and PhysicalStringType with collation id

2024-01-29 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-46831:


Assignee: Aleksandar Tomic

> Extend StringType and PhysicalStringType with collation id
> --
>
> Key: SPARK-46831
> URL: https://issues.apache.org/jira/browse/SPARK-46831
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46831) Extend StringType and PhysicalStringType with collation id

2024-01-29 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46831.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44901
[https://github.com/apache/spark/pull/44901]

> Extend StringType and PhysicalStringType with collation id
> --
>
> Key: SPARK-46831
> URL: https://issues.apache.org/jira/browse/SPARK-46831
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46810) Clarify error class terminology

2024-01-29 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811858#comment-17811858
 ] 

Max Gekk commented on SPARK-46810:
--

[~cloud_fan] [~LuciferYang] [~beliefer] [~dongjoon] WDYT?

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It may also match the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a class to a category is low impact and may 
> not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms may not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> —
> Side note: In either case, I believe talking about "42" and "K01" – 
> regardless of what we end up calling them – in front of users is not helpful. 
> I don't think anybody cares what "42" by itself means, or what "K01" by 
> itself means. Accordingly, we should limit how much we talk about these 
> concepts in the user-facing documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SPARK-46810) Clarify error class terminology

2024-01-29 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811856#comment-17811856
 ] 

Max Gekk commented on SPARK-46810:
--

Correct me if I am wrong but the SQL standard says about classes and 
sub-classes of SQLSTATE not about error classes which I think are different 
things. What about the option 3:
 * SQL state class: 42
 * SQL state sub-class: K01
 * SQL state: 42K01
 * Error class: INCOMPLETE_TYPE_DEFINITION
 * Error sub-classes: ARRAY, MAP, STRUCT

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It may also match the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a class to a category is low impact and may 
> not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms may not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> —
> Side note: In either case, I believe talking about "42" and "K01" – 
> regardless of what we end up calling them – in front of users is not helpful. 
> 

[jira] [Comment Edited] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Willi Raschkowski (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811850#comment-17811850
 ] 

Willi Raschkowski edited comment on SPARK-46893 at 1/29/24 12:15 PM:
-

cc [~dongjoon], for your awareness as PMC who's recently touched the UI.

I'm wondering if we should file a CVE for this.


was (Author: raschkowski):
[~dongjoon], for your awareness as PMC who's recently touched the UI.

I'm wondering if we should file a CVE for this.

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Commented] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Willi Raschkowski (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811850#comment-17811850
 ] 

Willi Raschkowski commented on SPARK-46893:
---

[~dongjoon], for your awareness as PMC who's recently touched the UI.

I'm wondering if we should file a CVE for this.

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Updated] (SPARK-46904) Fix wrong display of History UI summary

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46904:
---
Labels: pull-request-available  (was: )

> Fix wrong display of  History UI  summary
> -
>
> Key: SPARK-46904
> URL: https://issues.apache.org/jira/browse/SPARK-46904
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46904) Fix wrong display of History UI summary

2024-01-29 Thread Kent Yao (Jira)
Kent Yao created SPARK-46904:


 Summary: Fix wrong display of  History UI  summary
 Key: SPARK-46904
 URL: https://issues.apache.org/jira/browse/SPARK-46904
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46903) Support Spark History Server Log UI

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46903.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44932
[https://github.com/apache/spark/pull/44932]

> Support Spark History Server Log UI
> ---
>
> Key: SPARK-46903
> URL: https://issues.apache.org/jira/browse/SPARK-46903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46902) Fix Spark History Server UI for using un-exported setAppLimit

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46902:
--
Summary: Fix Spark History Server UI for using un-exported setAppLimit  
(was: Fix Spark History Server UI )

> Fix Spark History Server UI for using un-exported setAppLimit
> -
>
> Key: SPARK-46902
> URL: https://issues.apache.org/jira/browse/SPARK-46902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46902) Fix Spark History Server UI

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46902.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44931
[https://github.com/apache/spark/pull/44931]

> Fix Spark History Server UI 
> 
>
> Key: SPARK-46902
> URL: https://issues.apache.org/jira/browse/SPARK-46902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46902) Fix Spark History Server UI

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46902:
-

Assignee: Kent Yao

> Fix Spark History Server UI 
> 
>
> Key: SPARK-46902
> URL: https://issues.apache.org/jira/browse/SPARK-46902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46893:
---
Labels: pull-request-available  (was: )

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Updated] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Willi Raschkowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-46893:
--
Description: 
Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} handlers) 
in the UI job and stage descriptions.

The UI already has precaution to treat, e.g., 

[jira] [Updated] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Willi Raschkowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-46893:
--
Description: 
Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} handlers) 
in the UI job and stage descriptions.

The UI already has precaution to treat, e.g., 

[jira] [Updated] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Willi Raschkowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-46893:
--
Attachment: Screenshot 2024-01-29 at 09.06.34.png

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov, Screenshot 
> 2024-01-29 at 09.06.34.png
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Updated] (SPARK-46893) Remove inline scripts from UI descriptions

2024-01-29 Thread Willi Raschkowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-46893:
--
Summary: Remove inline scripts from UI descriptions  (was: Sanitize UI 
descriptions from inline scripts)

> Remove inline scripts from UI descriptions
> --
>
> Key: SPARK-46893
> URL: https://issues.apache.org/jira/browse/SPARK-46893
> Project: Spark
>  Issue Type: Bug
>  Components: UI, Web UI
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
> Attachments: Screen Recording 2024-01-28 at 17.51.47.mov
>
>
> Users can inject inline scripts (e.g. {{onclick}} or {{onmouseover}} 
> handlers) in the UI job and stage descriptions.
> The UI already has precaution to treat, e.g., 

[jira] [Assigned] (SPARK-46902) Fix Spark History Server UI

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46902:
--

Assignee: (was: Apache Spark)

> Fix Spark History Server UI 
> 
>
> Key: SPARK-46902
> URL: https://issues.apache.org/jira/browse/SPARK-46902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46902) Fix Spark History Server UI

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46902:
--

Assignee: Apache Spark

> Fix Spark History Server UI 
> 
>
> Key: SPARK-46902
> URL: https://issues.apache.org/jira/browse/SPARK-46902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46903) Support Spark History Server Log UI

2024-01-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46903:
-

Assignee: Dongjoon Hyun

> Support Spark History Server Log UI
> ---
>
> Key: SPARK-46903
> URL: https://issues.apache.org/jira/browse/SPARK-46903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46903) Support Spark History Server Log UI

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46903:
---
Labels: pull-request-available  (was: )

> Support Spark History Server Log UI
> ---
>
> Key: SPARK-46903
> URL: https://issues.apache.org/jira/browse/SPARK-46903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46903) Support Spark History Server Log UI

2024-01-29 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46903:
-

 Summary: Support Spark History Server Log UI
 Key: SPARK-46903
 URL: https://issues.apache.org/jira/browse/SPARK-46903
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46902) Fix Spark History Server UI

2024-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46902:
---
Labels: pull-request-available  (was: )

> Fix Spark History Server UI 
> 
>
> Key: SPARK-46902
> URL: https://issues.apache.org/jira/browse/SPARK-46902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46902) Fix Spark History Server UI

2024-01-29 Thread Kent Yao (Jira)
Kent Yao created SPARK-46902:


 Summary: Fix Spark History Server UI 
 Key: SPARK-46902
 URL: https://issues.apache.org/jira/browse/SPARK-46902
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org