[jira] [Created] (SPARK-42345) Rename TimestampNTZ inference conf as spark.sql.sources.timestampNTZTypeInference.enabled

2023-02-04 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42345:
--

 Summary: Rename TimestampNTZ inference conf as 
spark.sql.sources.timestampNTZTypeInference.enabled
 Key: SPARK-42345
 URL: https://issues.apache.org/jira/browse/SPARK-42345
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42345) Rename TimestampNTZ inference conf as spark.sql.sources.timestampNTZTypeInference.enabled

2023-02-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42345:


Assignee: Gengliang Wang  (was: Apache Spark)

> Rename TimestampNTZ inference conf as 
> spark.sql.sources.timestampNTZTypeInference.enabled
> -
>
> Key: SPARK-42345
> URL: https://issues.apache.org/jira/browse/SPARK-42345
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42345) Rename TimestampNTZ inference conf as spark.sql.sources.timestampNTZTypeInference.enabled

2023-02-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684129#comment-17684129
 ] 

Apache Spark commented on SPARK-42345:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39885

> Rename TimestampNTZ inference conf as 
> spark.sql.sources.timestampNTZTypeInference.enabled
> -
>
> Key: SPARK-42345
> URL: https://issues.apache.org/jira/browse/SPARK-42345
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42345) Rename TimestampNTZ inference conf as spark.sql.sources.timestampNTZTypeInference.enabled

2023-02-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42345:


Assignee: Apache Spark  (was: Gengliang Wang)

> Rename TimestampNTZ inference conf as 
> spark.sql.sources.timestampNTZTypeInference.enabled
> -
>
> Key: SPARK-42345
> URL: https://issues.apache.org/jira/browse/SPARK-42345
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug

2023-02-04 Thread Robin (Jira)
Robin created SPARK-42346:
-

 Summary: distinct(count colname) with UNION ALL causes query 
analyzer bug
 Key: SPARK-42346
 URL: https://issues.apache.org/jira/browse/SPARK-42346
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1
Reporter: Robin


If you combine a UNION ALL with a count(distinct colname) you get a query 
analyzer bug.

 

This behaviour is introduced in 3.3.0.  The bug was not present in 3.2.1.

 

Here is a reprex in PySpark:

{{df_pd = pd.DataFrame([}}
{{    \{'surname': 'a', 'first_name': 'b'}}}
{{])}}
{{df_spark = spark.createDataFrame(df_pd)}}
{{df_spark.createOrReplaceTempView("input_table")}}

{{sql = """}}

{{SELECT }}
{{    (SELECT Count(DISTINCT first_name) FROM   input_table) }}
{{        AS distinct_value_count}}
{{FROM   input_table}}
{{UNION ALL}}
{{SELECT }}
{{    (SELECT Count(DISTINCT surname) FROM   input_table) }}
{{        AS distinct_value_count}}
{{FROM   input_table """}}

{{spark.sql(sql).toPandas()}}

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug

2023-02-04 Thread Robin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin updated SPARK-42346:
--
Priority: Minor  (was: Major)

> distinct(count colname) with UNION ALL causes query analyzer bug
> 
>
> Key: SPARK-42346
> URL: https://issues.apache.org/jira/browse/SPARK-42346
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Robin
>Priority: Minor
>
> If you combine a UNION ALL with a count(distinct colname) you get a query 
> analyzer bug.
>  
> This behaviour is introduced in 3.3.0.  The bug was not present in 3.2.1.
>  
> Here is a reprex in PySpark:
> {{df_pd = pd.DataFrame([}}
> {{    \{'surname': 'a', 'first_name': 'b'}}}
> {{])}}
> {{df_spark = spark.createDataFrame(df_pd)}}
> {{df_spark.createOrReplaceTempView("input_table")}}
> {{sql = """}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT first_name) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table}}
> {{UNION ALL}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT surname) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table """}}
> {{spark.sql(sql).toPandas()}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug

2023-02-04 Thread Robin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin updated SPARK-42346:
--
Priority: Major  (was: Minor)

> distinct(count colname) with UNION ALL causes query analyzer bug
> 
>
> Key: SPARK-42346
> URL: https://issues.apache.org/jira/browse/SPARK-42346
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Robin
>Priority: Major
>
> If you combine a UNION ALL with a count(distinct colname) you get a query 
> analyzer bug.
>  
> This behaviour is introduced in 3.3.0.  The bug was not present in 3.2.1.
>  
> Here is a reprex in PySpark:
> {{df_pd = pd.DataFrame([}}
> {{    \{'surname': 'a', 'first_name': 'b'}}}
> {{])}}
> {{df_spark = spark.createDataFrame(df_pd)}}
> {{df_spark.createOrReplaceTempView("input_table")}}
> {{sql = """}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT first_name) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table}}
> {{UNION ALL}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT surname) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table """}}
> {{spark.sql(sql).toPandas()}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42297) Assign name to _LEGACY_ERROR_TEMP_2412

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42297.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39869
[https://github.com/apache/spark/pull/39869]

> Assign name to _LEGACY_ERROR_TEMP_2412
> --
>
> Key: SPARK-42297
> URL: https://issues.apache.org/jira/browse/SPARK-42297
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42297) Assign name to _LEGACY_ERROR_TEMP_2412

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42297:


Assignee: Haejoon Lee

> Assign name to _LEGACY_ERROR_TEMP_2412
> --
>
> Key: SPARK-42297
> URL: https://issues.apache.org/jira/browse/SPARK-42297
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42238) Introduce `INCOMPATIBLE_JOIN_TYPES`

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42238.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39805
[https://github.com/apache/spark/pull/39805]

> Introduce `INCOMPATIBLE_JOIN_TYPES`
> ---
>
> Key: SPARK-42238
> URL: https://issues.apache.org/jira/browse/SPARK-42238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42238) Introduce `INCOMPATIBLE_JOIN_TYPES`

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42238:


Assignee: Haejoon Lee

> Introduce `INCOMPATIBLE_JOIN_TYPES`
> ---
>
> Key: SPARK-42238
> URL: https://issues.apache.org/jira/browse/SPARK-42238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41302:


Assignee: Max Gekk

> Assign a name to the error class _LEGACY_ERROR_TEMP_1185
> 
>
> Key: SPARK-41302
> URL: https://issues.apache.org/jira/browse/SPARK-41302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Max Gekk
>Priority: Minor
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve 
> error message and tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41302.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39723
[https://github.com/apache/spark/pull/39723]

> Assign a name to the error class _LEGACY_ERROR_TEMP_1185
> 
>
> Key: SPARK-41302
> URL: https://issues.apache.org/jira/browse/SPARK-41302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Max Gekk
>Priority: Minor
> Fix For: 3.4.0
>
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve 
> error message and tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug

2023-02-04 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684177#comment-17684177
 ] 

Yuming Wang commented on SPARK-42346:
-

cc [~petertoth]

> distinct(count colname) with UNION ALL causes query analyzer bug
> 
>
> Key: SPARK-42346
> URL: https://issues.apache.org/jira/browse/SPARK-42346
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Robin
>Priority: Major
>
> If you combine a UNION ALL with a count(distinct colname) you get a query 
> analyzer bug.
>  
> This behaviour is introduced in 3.3.0.  The bug was not present in 3.2.1.
>  
> Here is a reprex in PySpark:
> {{df_pd = pd.DataFrame([}}
> {{    \{'surname': 'a', 'first_name': 'b'}}}
> {{])}}
> {{df_spark = spark.createDataFrame(df_pd)}}
> {{df_spark.createOrReplaceTempView("input_table")}}
> {{sql = """}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT first_name) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table}}
> {{UNION ALL}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT surname) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table """}}
> {{spark.sql(sql).toPandas()}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug

2023-02-04 Thread Peter Toth (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684184#comment-17684184
 ] 

Peter Toth commented on SPARK-42346:


Thanks for pinging me [~yumwang], this might be subquery merge related. I will 
look into it.

> distinct(count colname) with UNION ALL causes query analyzer bug
> 
>
> Key: SPARK-42346
> URL: https://issues.apache.org/jira/browse/SPARK-42346
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Robin
>Priority: Major
>
> If you combine a UNION ALL with a count(distinct colname) you get a query 
> analyzer bug.
>  
> This behaviour is introduced in 3.3.0.  The bug was not present in 3.2.1.
>  
> Here is a reprex in PySpark:
> {{df_pd = pd.DataFrame([}}
> {{    \{'surname': 'a', 'first_name': 'b'}}}
> {{])}}
> {{df_spark = spark.createDataFrame(df_pd)}}
> {{df_spark.createOrReplaceTempView("input_table")}}
> {{sql = """}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT first_name) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table}}
> {{UNION ALL}}
> {{SELECT }}
> {{    (SELECT Count(DISTINCT surname) FROM   input_table) }}
> {{        AS distinct_value_count}}
> {{FROM   input_table """}}
> {{spark.sql(sql).toPandas()}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41302:


Assignee: Narek Karapetian  (was: Max Gekk)

> Assign a name to the error class _LEGACY_ERROR_TEMP_1185
> 
>
> Key: SPARK-41302
> URL: https://issues.apache.org/jira/browse/SPARK-41302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Narek Karapetian
>Priority: Minor
> Fix For: 3.4.0
>
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve 
> error message and tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42341) Fix JoinSelectionHelperSuite and PlanStabilitySuite to use explicit broadcast threshold

2023-02-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42341.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39881
[https://github.com/apache/spark/pull/39881]

> Fix JoinSelectionHelperSuite and PlanStabilitySuite to use explicit broadcast 
> threshold
> ---
>
> Key: SPARK-42341
> URL: https://issues.apache.org/jira/browse/SPARK-42341
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42341) Fix JoinSelectionHelperSuite and PlanStabilitySuite to use explicit broadcast threshold

2023-02-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42341:
-

Assignee: Dongjoon Hyun

> Fix JoinSelectionHelperSuite and PlanStabilitySuite to use explicit broadcast 
> threshold
> ---
>
> Key: SPARK-42341
> URL: https://issues.apache.org/jira/browse/SPARK-42341
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42347) Arrow string and binary vectors only support 1 GiB

2023-02-04 Thread Adam Binford (Jira)
Adam Binford created SPARK-42347:


 Summary: Arrow string and binary vectors only support 1 GiB
 Key: SPARK-42347
 URL: https://issues.apache.org/jira/browse/SPARK-42347
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Adam Binford


Since Arrow 10.0.0, BaseVariableWidthVector (the parent for string and binary 
vectors), only supports expanding up to 1 GiB through the safe interfaces, 
which Spark uses, instead of 2 GiB previously. This is due to 
[https://github.com/apache/arrow/pull/13815.] I added a comment in there but 
haven't got any responses yet, will make an issue in Arrow as well.

Basically whenever you try to add data beyond 1 GiB, the vector will try to 
double itself to the next power of two, which would be {{{}2147483648{}}}, 
which is greater than {{Integer.MAX_VALUE}} which is {{{}2147483647{}}}, thus 
throwing a {{{}OversizedAllocationException{}}}.

See [https://github.com/apache/spark/pull/39572#issuecomment-1383195213] and 
the comment above for how I recreated to show this was now the case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42347) Arrow string and binary vectors only support 1 GiB

2023-02-04 Thread Adam Binford (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684212#comment-17684212
 ] 

Adam Binford commented on SPARK-42347:
--

[https://github.com/apache/spark/pull/39572] is a potential workaround to allow 
enabling the large variable width vectors when users hit this limit.

> Arrow string and binary vectors only support 1 GiB
> --
>
> Key: SPARK-42347
> URL: https://issues.apache.org/jira/browse/SPARK-42347
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Adam Binford
>Priority: Major
>
> Since Arrow 10.0.0, BaseVariableWidthVector (the parent for string and binary 
> vectors), only supports expanding up to 1 GiB through the safe interfaces, 
> which Spark uses, instead of 2 GiB previously. This is due to 
> [https://github.com/apache/arrow/pull/13815.] I added a comment in there but 
> haven't got any responses yet, will make an issue in Arrow as well.
> Basically whenever you try to add data beyond 1 GiB, the vector will try to 
> double itself to the next power of two, which would be {{{}2147483648{}}}, 
> which is greater than {{Integer.MAX_VALUE}} which is {{{}2147483647{}}}, thus 
> throwing a {{{}OversizedAllocationException{}}}.
> See [https://github.com/apache/spark/pull/39572#issuecomment-1383195213] and 
> the comment above for how I recreated to show this was now the case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42334) Make sure connect client assembly and sql package is built before running client tests - SBT

2023-02-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42334.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Make sure connect client assembly and sql package is built before running 
> client tests - SBT
> 
>
> Key: SPARK-42334
> URL: https://issues.apache.org/jira/browse/SPARK-42334
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42334) Make sure connect client assembly and sql package is built before running client tests - SBT

2023-02-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42334:
-

Assignee: Yang Jie

> Make sure connect client assembly and sql package is built before running 
> client tests - SBT
> 
>
> Key: SPARK-42334
> URL: https://issues.apache.org/jira/browse/SPARK-42334
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42343) Ignore `IOException` in `handleBlockRemovalFailure` if SparkContext is stopped

2023-02-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42343.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39883
[https://github.com/apache/spark/pull/39883]

> Ignore `IOException` in `handleBlockRemovalFailure` if SparkContext is stopped
> --
>
> Key: SPARK-42343
> URL: https://issues.apache.org/jira/browse/SPARK-42343
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42343) Ignore `IOException` in `handleBlockRemovalFailure` if SparkContext is stopped

2023-02-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42343:
-

Assignee: Dongjoon Hyun

> Ignore `IOException` in `handleBlockRemovalFailure` if SparkContext is stopped
> --
>
> Key: SPARK-42343
> URL: https://issues.apache.org/jira/browse/SPARK-42343
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684245#comment-17684245
 ] 

Haejoon Lee commented on SPARK-42309:
-

I'm working on it.

> Assign name to _LEGACY_ERROR_TEMP_1204
> --
>
> Key: SPARK-42309
> URL: https://issues.apache.org/jira/browse/SPARK-42309
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684248#comment-17684248
 ] 

Haejoon Lee commented on SPARK-42310:
-

I'm working on it.

> Assign name to _LEGACY_ERROR_TEMP_1289
> --
>
> Key: SPARK-42310
> URL: https://issues.apache.org/jira/browse/SPARK-42310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42311) Assign name to _LEGACY_ERROR_TEMP_2432

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684249#comment-17684249
 ] 

Haejoon Lee commented on SPARK-42311:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_2432
> --
>
> Key: SPARK-42311
> URL: https://issues.apache.org/jira/browse/SPARK-42311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684250#comment-17684250
 ] 

Haejoon Lee commented on SPARK-42312:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_0042
> --
>
> Key: SPARK-42312
> URL: https://issues.apache.org/jira/browse/SPARK-42312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684251#comment-17684251
 ] 

Haejoon Lee commented on SPARK-42313:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_1152
> --
>
> Key: SPARK-42313
> URL: https://issues.apache.org/jira/browse/SPARK-42313
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42315) Assign name to _LEGACY_ERROR_TEMP_2092

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684253#comment-17684253
 ] 

Haejoon Lee commented on SPARK-42315:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_2092
> --
>
> Key: SPARK-42315
> URL: https://issues.apache.org/jira/browse/SPARK-42315
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42314) Assign name to _LEGACY_ERROR_TEMP_2127

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684252#comment-17684252
 ] 

Haejoon Lee commented on SPARK-42314:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_2127
> --
>
> Key: SPARK-42314
> URL: https://issues.apache.org/jira/browse/SPARK-42314
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42319) Assign name to _LEGACY_ERROR_TEMP_2123

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684255#comment-17684255
 ] 

Haejoon Lee commented on SPARK-42319:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_2123
> --
>
> Key: SPARK-42319
> URL: https://issues.apache.org/jira/browse/SPARK-42319
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42318) Assign name to _LEGACY_ERROR_TEMP_2125

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684254#comment-17684254
 ] 

Haejoon Lee commented on SPARK-42318:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_2125
> --
>
> Key: SPARK-42318
> URL: https://issues.apache.org/jira/browse/SPARK-42318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42320) Assign name to _LEGACY_ERROR_TEMP_2188

2023-02-04 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684256#comment-17684256
 ] 

Haejoon Lee commented on SPARK-42320:
-

I'm working on it

> Assign name to _LEGACY_ERROR_TEMP_2188
> --
>
> Key: SPARK-42320
> URL: https://issues.apache.org/jira/browse/SPARK-42320
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42345) Rename TimestampNTZ inference conf as spark.sql.sources.timestampNTZTypeInference.enabled

2023-02-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42345.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39885
[https://github.com/apache/spark/pull/39885]

> Rename TimestampNTZ inference conf as 
> spark.sql.sources.timestampNTZTypeInference.enabled
> -
>
> Key: SPARK-42345
> URL: https://issues.apache.org/jira/browse/SPARK-42345
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org