[jira] [Resolved] (SPARK-47408) Fix mathExpressions that use StringType

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47408.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46227
[https://github.com/apache/spark/pull/46227]

> Fix mathExpressions that use StringType
> ---
>
> Key: SPARK-47408
> URL: https://issues.apache.org/jira/browse/SPARK-47408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-26 Thread Gene Pang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gene Pang updated SPARK-48019:
--
Description: 
{{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
return a primitive array with the contents of the vector. When the ColumnVector 
has a dictionary, the values are decoded with the dictionary before filling in 
the primitive array.

However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the {{null}} entries of the vector. Sometimes, this can cause an 
{{ArrayIndexOutOfBoundsException}} .

In addition to the possible Exception, copying a {{ColumnarArray}} is not 
correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
{{null}} values. However, the {{copy()}} for primitive types does not take into 
account the null-ness of the entries, and blindly copies all the primitive 
values. That means the null entries get lost.

  was:
`ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a 
primitive array with the contents of the vector. When the ColumnVector has a 
dictionary, the values are decoded with the dictionary before filling in the 
primitive array.

However, `ColumnVectors` can have `null`s, and for those `null` entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the `null` entries of the vector. Sometimes, this can cause an 
`ArrayIndexOutOfBoundsException` .

In addition to the possible Exception, copying a `ColumnarArray` is not 
correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` 
values. However, the `copy()` for primitive types does not take into account 
the null-ness of the entries, and blindly copies all the primitive values. That 
means the null entries get lost.


> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Priority: Major
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-26 Thread Gene Pang (Jira)
Gene Pang created SPARK-48019:
-

 Summary: ColumnVectors with dictionaries and nulls are not 
read/copied correctly
 Key: SPARK-48019
 URL: https://issues.apache.org/jira/browse/SPARK-48019
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3
Reporter: Gene Pang


`ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a 
primitive array with the contents of the vector. When the ColumnVector has a 
dictionary, the values are decoded with the dictionary before filling in the 
primitive array.

However, `ColumnVectors` can have `null`s, and for those `null` entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the `null` entries of the vector. Sometimes, this can cause an 
`ArrayIndexOutOfBoundsException` .

In addition to the possible Exception, copying a `ColumnarArray` is not 
correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` 
values. However, the `copy()` for primitive types does not take into account 
the null-ness of the entries, and blindly copies all the primitive values. That 
means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange

2024-04-26 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-48018:


Assignee: B. Micheal Okutubo

> Null groupId causing missing param error when throwing 
> KafkaException.couldNotReadOffsetRange
> -
>
> Key: SPARK-48018
> URL: https://issues.apache.org/jira/browse/SPARK-48018
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: B. Micheal Okutubo
>Assignee: B. Micheal Okutubo
>Priority: Major
>  Labels: pull-request-available
>
> [INTERNAL_ERROR] Undefined error message parameter for error class: 
> 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'
> when groupId is null when we are about to throw 
> KafkaException.couldNotReadOffsetRange error.
> The error framework requires all params to be non-null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange

2024-04-26 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-48018.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46253
[https://github.com/apache/spark/pull/46253]

> Null groupId causing missing param error when throwing 
> KafkaException.couldNotReadOffsetRange
> -
>
> Key: SPARK-48018
> URL: https://issues.apache.org/jira/browse/SPARK-48018
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: B. Micheal Okutubo
>Assignee: B. Micheal Okutubo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [INTERNAL_ERROR] Undefined error message parameter for error class: 
> 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'
> when groupId is null when we are about to throw 
> KafkaException.couldNotReadOffsetRange error.
> The error framework requires all params to be non-null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48017) Add Spark application submission worker for operator

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48017:
---
Labels: pull-request-available  (was: )

> Add Spark application submission worker for operator
> 
>
> Key: SPARK-48017
> URL: https://issues.apache.org/jira/browse/SPARK-48017
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> Spark Operator needs a submission worker that converts it's application 
> abstraction (Operator API) to k8s resources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange

2024-04-26 Thread B. Micheal Okutubo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

B. Micheal Okutubo updated SPARK-48018:
---
Description: 
[INTERNAL_ERROR] Undefined error message parameter for error class: 
'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'

when groupId is null when we are about to throw 
KafkaException.couldNotReadOffsetRange error.

The error framework requires all params to be non-null.

  was:
[INTERNAL_ERROR] Undefined error message parameter for error class: 
'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'

when groupId is null when we are about to throw 
KafkaException.couldNotReadOffsetRange error


> Null groupId causing missing param error when throwing 
> KafkaException.couldNotReadOffsetRange
> -
>
> Key: SPARK-48018
> URL: https://issues.apache.org/jira/browse/SPARK-48018
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: B. Micheal Okutubo
>Priority: Major
>  Labels: pull-request-available
>
> [INTERNAL_ERROR] Undefined error message parameter for error class: 
> 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'
> when groupId is null when we are about to throw 
> KafkaException.couldNotReadOffsetRange error.
> The error framework requires all params to be non-null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48018:
---
Labels: pull-request-available  (was: )

> Null groupId causing missing param error when throwing 
> KafkaException.couldNotReadOffsetRange
> -
>
> Key: SPARK-48018
> URL: https://issues.apache.org/jira/browse/SPARK-48018
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: B. Micheal Okutubo
>Priority: Major
>  Labels: pull-request-available
>
> [INTERNAL_ERROR] Undefined error message parameter for error class: 
> 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'
> when groupId is null when we are about to throw 
> KafkaException.couldNotReadOffsetRange error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange

2024-04-26 Thread B. Micheal Okutubo (Jira)
B. Micheal Okutubo created SPARK-48018:
--

 Summary: Null groupId causing missing param error when throwing 
KafkaException.couldNotReadOffsetRange
 Key: SPARK-48018
 URL: https://issues.apache.org/jira/browse/SPARK-48018
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: B. Micheal Okutubo


[INTERNAL_ERROR] Undefined error message parameter for error class: 
'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE'

when groupId is null when we are about to throw 
KafkaException.couldNotReadOffsetRange error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48017) Add Spark application submission worker for operator

2024-04-26 Thread Zhou JIANG (Jira)
Zhou JIANG created SPARK-48017:
--

 Summary: Add Spark application submission worker for operator
 Key: SPARK-48017
 URL: https://issues.apache.org/jira/browse/SPARK-48017
 Project: Spark
  Issue Type: Sub-task
  Components: k8s
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG


Spark Operator needs a submission worker that converts it's application 
abstraction (Operator API) to k8s resources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48016) Binary Arithmetic operators should include the evalMode when makeCopy

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48016:
---
Labels: pull-request-available  (was: )

> Binary Arithmetic operators should include the evalMode when makeCopy
> -
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48016) Binary Arithmetic operators should include the evalMode when makeCopy

2024-04-26 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-48016:
---
Summary: Binary Arithmetic operators should include the evalMode when 
makeCopy  (was: Binary Arithmetic operators should include the evalMode during 
makeCopy)

> Binary Arithmetic operators should include the evalMode when makeCopy
> -
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48016) Binary Arithmetic operators should include the evalMode during makeCopy

2024-04-26 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-48016:
--

 Summary: Binary Arithmetic operators should include the evalMode 
during makeCopy
 Key: SPARK-48016
 URL: https://issues.apache.org/jira/browse/SPARK-48016
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0, 3.5.2
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Binary Arithmetic operators should include the evalMode during makeCopy. 
Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
returning null

 
{code:java}
SELECT try_divide(1, decimal(0)); {code}
This is caused from the rule DecimalPrecision:
{code:java}
case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
  (left, right) match {
 ...
case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
l.dataType.isInstanceOf[IntegralType] &&
literalPickMinimumPrecision =>
  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47696) try_to_timestamp should handle SparkUpgradeException

2024-04-26 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47696.

Resolution: Won't Fix

> try_to_timestamp should handle SparkUpgradeException
> 
>
> Key: SPARK-47696
> URL: https://issues.apache.org/jira/browse/SPARK-47696
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2, 3.4.3
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, try_to_timestamp will throw an exception on legacy timestamp input.
> {code:java}
> > SELECT try_to_timestamp('2016-12-1', '-MM-dd')
> org.apache.spark.SparkUpgradeException: 
> [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may 
> get a different result due to the upgrading to Spark >= 3.0:
> Fail to parse '2016-12-1' in the new parser.
> You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the 
> behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid 
> datetime string. SQLSTATE: 42K0B {code}
> It should return null instead of error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47943:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48015:
-

Assignee: Dongjoon Hyun

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48015.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9
[https://github.com/apache/spark-kubernetes-operator/pull/9]

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47929) Setup Static Analysis for Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47929:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Setup Static Analysis for Operator
> --
>
> Key: SPARK-47929
> URL: https://issues.apache.org/jira/browse/SPARK-47929
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Add common analysis tasks including checkstyle, spotbugs, jacoco. Also 
> include spotless for style fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47950:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48015:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48015:
---
Labels: pull-request-available  (was: )

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48015:
-

 Summary: Update `build.gradle` to fix deprecation warnings
 Key: SPARK-48015
 URL: https://issues.apache.org/jira/browse/SPARK-48015
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Kubernetes
Affects Versions: kubernetes-operator-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47950:
-

Assignee: Zhou JIANG

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47950.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 8
[https://github.com/apache/spark-kubernetes-operator/pull/8]

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48014:
---
Labels: pull-request-available  (was: )

> Change the makeFromJava error in EvaluatePython to a user-facing error
> --
>
> Key: SPARK-48014
> URL: https://issues.apache.org/jira/browse/SPARK-48014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error

2024-04-26 Thread Allison Wang (Jira)
Allison Wang created SPARK-48014:


 Summary: Change the makeFromJava error in EvaluatePython to a 
user-facing error
 Key: SPARK-48014
 URL: https://issues.apache.org/jira/browse/SPARK-48014
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Allison Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-26 Thread Tatu Saloranta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841355#comment-17841355
 ] 

Tatu Saloranta commented on SPARK-47959:


Aside from question of reducing contention in InternCache, my experience has 
been that if this blocking is hit there is always some other problem involved: 
either unbounded number of keys (like UUID keys) or lack of `JsonFactory` 
reuse. In latter case the best solution is to try to use JsonFactory (whether 
directly or by reusing ObjectMapper that owns it); in former case (or, as 2nd 
alternative for latter case), there are 2 `JsonFactory.Feature` settings that 
may be disabled:
 * JsonFactory.Feature.INTERN_FIELD_NAMES: if names are not reused across 
reads, there is little value in String.intern()
 * JsonFactory.Feature.CANONICALIZE_FIELD_NAMES: ... or if there's no reuse nor 
repeating symbols, the whole canonicalization can be disabled.

and so it may be worth experimenting with these settings (disabling one or the 
other: if CANONICALIZE_FIELD_NAMES disabled INTERN_FIELD_NAMES does not matter).

Put another way: while there is some value in improving locking of 
`InternCache`, it is unlikely to be the most effective solution to whatever 
problem there is.

> Improve GET_JSON_OBJECT performance on executors running multiple tasks
> ---
>
> Key: SPARK-47959
> URL: https://issues.apache.org/jira/browse/SPARK-47959
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Zheng Shao
>Priority: Major
>
> We have a Spark executor that is running 32 workers in parallel.  The query 
> is a simple SELECT with several `GET_JSON_OBJECT` UDF calls.
> We noticed that 80+% of the stacktrace of the worker threads are blocked on 
> the following stacktrace:
>  
> {code:java}
> com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - 
> blocked on java.lang.Object@7529fde1 
> com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798)
>  
> com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown
>  Source)
> ...
> {code}
>  
> Apparently jackson-core has such a performance bug from version 2.3 - 2.15, 
> and not fixed until version 2.18 (unreleased): 
> [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50]
>  
> {code:java}
>             synchronized (lock) {
>                 if (size() >= MAX_ENTRIES) {
>                     clear();
>                 }
>             }
> {code}
>  
> instead of 
> [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59]
>  
> {code:java}
>             /* As of 2.18, the limit is not strictly enforced, but we do try 
> to
>              * clear entries if we have reached the limit. We do not expect to
>              * go too much over the limit, and if we do, it's not a huge 
> problem.
>              * If some other thread has the lock, we will not clear but the 
> lock should
>              * not be held for long, so another thread should be able to 
> clear in the near future.
>              */
>             if (lock.tryLock()) {
>                 try {
>                     if (size() >= DEFAULT_MAX_ENTRIES) {
>                         clear();
>                     }
>                 } finally {
>                     lock.unlock();
>                 }
>             }   {code}
>  
> Potential fixes:
>  # Upgrade to Jackson-core 2.18 when it's released;
>  # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I 

[jira] [Commented] (SPARK-47219) XML: Ignore commented row tags in XML tokenizer

2024-04-26 Thread Sandip Agarwala (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841353#comment-17841353
 ] 

Sandip Agarwala commented on SPARK-47219:
-

Correct. Thanks for pointing it out. I closed it as duplicate.

> XML: Ignore commented row tags in XML tokenizer
> ---
>
> Key: SPARK-47219
> URL: https://issues.apache.org/jira/browse/SPARK-47219
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47219) XML: Ignore commented row tags in XML tokenizer

2024-04-26 Thread Sandip Agarwala (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandip Agarwala resolved SPARK-47219.
-
Resolution: Duplicate

> XML: Ignore commented row tags in XML tokenizer
> ---
>
> Key: SPARK-47219
> URL: https://issues.apache.org/jira/browse/SPARK-47219
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47219) XML: Ignore commented row tags in XML tokenizer

2024-04-26 Thread HiuFung Kwok (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841351#comment-17841351
 ] 

HiuFung Kwok commented on SPARK-47219:
--

[~sandip.agarwala] Do we need this task, it seems to be duplicate of 
SPARK-47218 .

> XML: Ignore commented row tags in XML tokenizer
> ---
>
> Key: SPARK-47219
> URL: https://issues.apache.org/jira/browse/SPARK-47219
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42846) Assign a name to the error class _LEGACY_ERROR_TEMP_2011

2024-04-26 Thread HiuFung Kwok (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841347#comment-17841347
 ] 

HiuFung Kwok commented on SPARK-42846:
--

Hi [~maxgekk] , I have submitted an MR for this, would you mind having a look?

Thx.

 

> Assign a name to the error class _LEGACY_ERROR_TEMP_2011
> 
>
> Key: SPARK-42846
> URL: https://issues.apache.org/jira/browse/SPARK-42846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48012) SPJ: Support Transfrom Expressions for One Side Shuffle

2024-04-26 Thread Szehon Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated SPARK-48012:
--
Parent: SPARK-37375
Issue Type: Sub-task  (was: New Feature)

> SPJ: Support Transfrom Expressions for One Side Shuffle
> ---
>
> Key: SPARK-48012
> URL: https://issues.apache.org/jira/browse/SPARK-48012
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Szehon Ho
>Priority: Major
>
> SPARK-41471 allowed Spark to shuffle just one side and still conduct SPJ, if 
> the other side is KeyGroupedPartitioning.  However, the support was just for 
> a KeyGroupedPartition without any partition transform (day, year, bucket).  
> It will be useful to add support for partition transform as well, as there 
> are many tables partitioned by those transforms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48012) SPJ: Support Transfrom Expressions for One Side Shuffle

2024-04-26 Thread Szehon Ho (Jira)
Szehon Ho created SPARK-48012:
-

 Summary: SPJ: Support Transfrom Expressions for One Side Shuffle
 Key: SPARK-48012
 URL: https://issues.apache.org/jira/browse/SPARK-48012
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.3
Reporter: Szehon Ho


SPARK-41471 allowed Spark to shuffle just one side and still conduct SPJ, if 
the other side is KeyGroupedPartitioning.  However, the support was just for a 
KeyGroupedPartition without any partition transform (day, year, bucket).  It 
will be useful to add support for partition transform as well, as there are 
many tables partitioned by those transforms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-26 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841339#comment-17841339
 ] 

PJ Fanning commented on SPARK-47959:


[~zshao] if you have a test environment, could you try it with the 
2.18.0-SNAPSHOT Jackson jars to see if they halp?

> Improve GET_JSON_OBJECT performance on executors running multiple tasks
> ---
>
> Key: SPARK-47959
> URL: https://issues.apache.org/jira/browse/SPARK-47959
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Zheng Shao
>Priority: Major
>
> We have a Spark executor that is running 32 workers in parallel.  The query 
> is a simple SELECT with several `GET_JSON_OBJECT` UDF calls.
> We noticed that 80+% of the stacktrace of the worker threads are blocked on 
> the following stacktrace:
>  
> {code:java}
> com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - 
> blocked on java.lang.Object@7529fde1 
> com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798)
>  
> com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown
>  Source)
> ...
> {code}
>  
> Apparently jackson-core has such a performance bug from version 2.3 - 2.15, 
> and not fixed until version 2.18 (unreleased): 
> [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50]
>  
> {code:java}
>             synchronized (lock) {
>                 if (size() >= MAX_ENTRIES) {
>                     clear();
>                 }
>             }
> {code}
>  
> instead of 
> [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59]
>  
> {code:java}
>             /* As of 2.18, the limit is not strictly enforced, but we do try 
> to
>              * clear entries if we have reached the limit. We do not expect to
>              * go too much over the limit, and if we do, it's not a huge 
> problem.
>              * If some other thread has the lock, we will not clear but the 
> lock should
>              * not be held for long, so another thread should be able to 
> clear in the near future.
>              */
>             if (lock.tryLock()) {
>                 try {
>                     if (size() >= DEFAULT_MAX_ENTRIES) {
>                         clear();
>                     }
>                 } finally {
>                     lock.unlock();
>                 }
>             }   {code}
>  
> Potential fixes:
>  # Upgrade to Jackson-core 2.18 when it's released;
>  # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I don't 
> totally understand the options suggested by this thread yet.
>  # Introduce a new UDF that doesn't depend on jackson-core



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48010.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46248
[https://github.com/apache/spark/pull/46248]

> Avoid repeated calls to conf.resolver in resolveExpression
> --
>
> Key: SPARK-48010
> URL: https://issues.apache.org/jira/browse/SPARK-48010
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Nikhil Sheoran
>Assignee: Nikhil Sheoran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Consider a view with a large number of columns (~1000s). When resolving this 
> view, looking at the flamegraph, observed repeated initializations of `conf` 
> to obtain the `resolver` for each column of the view.
> This can be easily optimized to reuse the same resolver (obtained once) for 
> the various calls to `innerResolve` in `resolveExpression`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48011) Store LogKey name as a value to avoid generating new string instances

2024-04-26 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-48011:
--

 Summary: Store LogKey name as a value to avoid generating new 
string instances
 Key: SPARK-48011
 URL: https://issues.apache.org/jira/browse/SPARK-48011
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47963) Make the external Spark ecosystem can use structured logging mechanisms

2024-04-26 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47963.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46193
[https://github.com/apache/spark/pull/46193]

> Make the external Spark ecosystem can use structured logging mechanisms 
> 
>
> Key: SPARK-47963
> URL: https://issues.apache.org/jira/browse/SPARK-47963
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46122:
-

Assignee: Dongjoon Hyun

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48005.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46242
[https://github.com/apache/spark/pull/46242]

> Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
> -
>
> Key: SPARK-48005
> URL: https://issues.apache.org/jira/browse/SPARK-48005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48010:
---
Labels: pull-request-available  (was: )

> Avoid repeated calls to conf.resolver in resolveExpression
> --
>
> Key: SPARK-48010
> URL: https://issues.apache.org/jira/browse/SPARK-48010
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Nikhil Sheoran
>Priority: Major
>  Labels: pull-request-available
>
> Consider a view with a large number of columns (~1000s). When resolving this 
> view, looking at the flamegraph, observed repeated initializations of `conf` 
> to obtain the `resolver` for each column of the view.
> This can be easily optimized to reuse the same resolver (obtained once) for 
> the various calls to `innerResolve` in `resolveExpression`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression

2024-04-26 Thread Nikhil Sheoran (Jira)
Nikhil Sheoran created SPARK-48010:
--

 Summary: Avoid repeated calls to conf.resolver in resolveExpression
 Key: SPARK-48010
 URL: https://issues.apache.org/jira/browse/SPARK-48010
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.3
Reporter: Nikhil Sheoran


Consider a view with a large number of columns (~1000s). When resolving this 
view, looking at the flamegraph, observed repeated initializations of `conf` to 
obtain the `resolver` for each column of the view.

This can be easily optimized to reuse the same resolver (obtained once) for the 
various calls to `innerResolve` in `resolveExpression`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-26 Thread Zheng Shao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated SPARK-47959:
---
Description: 
We have a Spark executor that is running 32 workers in parallel.  The query is 
a simple SELECT with several `GET_JSON_OBJECT` UDF calls.

We noticed that 80+% of the stacktrace of the worker threads are blocked on the 
following stacktrace:

 
{code:java}
com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - 
blocked on java.lang.Object@7529fde1 
com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798)
 
com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown
 Source)
...
{code}
 

Apparently jackson-core has such a performance bug from version 2.3 - 2.15, and 
not fixed until version 2.18 (unreleased): 
[https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50]
 
{code:java}
            synchronized (lock) {
                if (size() >= MAX_ENTRIES) {
                    clear();
                }
            }
{code}
 
instead of 
[https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59]
 
{code:java}
            /* As of 2.18, the limit is not strictly enforced, but we do try to
             * clear entries if we have reached the limit. We do not expect to
             * go too much over the limit, and if we do, it's not a huge 
problem.
             * If some other thread has the lock, we will not clear but the 
lock should
             * not be held for long, so another thread should be able to clear 
in the near future.
             */
            if (lock.tryLock()) {
                try {
                    if (size() >= DEFAULT_MAX_ENTRIES) {
                        clear();
                    }
                } finally {
                    lock.unlock();
                }
            }   {code}
 

Potential fixes:
 # Upgrade to Jackson-core 2.18 when it's released;
 # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I don't 
totally understand the options suggested by this thread yet.
 # Introduce a new UDF that doesn't depend on jackson-core

  was:
We have a Spark executor that is running 32 workers in parallel.  The query is 
a simple SELECT with several `GET_JSON_OBJECT` UDF calls.

We noticed that 80+% of the stacktrace of the worker threads are blocked on the 
following stacktrace:

 
{code:java}
com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - 
blocked on java.lang.Object@7529fde1 
com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825)
 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798)
 
com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196)
 
org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown
 

[jira] [Resolved] (SPARK-47968) MsSQLServer: Map datatimeoffset to TimestampType

2024-04-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47968.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46239
[https://github.com/apache/spark/pull/46239]

> MsSQLServer: Map datatimeoffset to TimestampType
> 
>
> Key: SPARK-47968
> URL: https://issues.apache.org/jira/browse/SPARK-47968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47476:
---

Assignee: Uroš Bojanić

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47476) StringReplace (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47476.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45704
[https://github.com/apache/spark/pull/45704]

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48007.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46244
[https://github.com/apache/spark/pull/46244]

> MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
> ---
>
> Key: SPARK-48007
> URL: https://issues.apache.org/jira/browse/SPARK-48007
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-30709) Spark 2.3 to Spark 2.4 Upgrade. Problems reading HIVE partitioned tables.

2024-04-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen deleted SPARK-30709:
-


> Spark 2.3 to Spark 2.4 Upgrade. Problems reading HIVE partitioned tables.
> -
>
> Key: SPARK-30709
> URL: https://issues.apache.org/jira/browse/SPARK-30709
> Project: Spark
>  Issue Type: Question
> Environment: PRE- Production
>Reporter: Carlos Mario
>Priority: Major
>  Labels: SQL, Spark
>
> Hello
> We recently updated our preproduction environment from Spark 2.3 to Spark 
> 2.4.0
> Along time we have created a big amount of tables in Hive Metastore, 
> partitioned by 2 fields one of them String and the other one BigInt.
> We were reading this tables with Spark 2.3 with no problem, but after 
> upgrading to Spark 2.4 we get the following log every time we run our SW:
> 
> log_filterBIGINT.out:
>  Caused by: MetaException(message:Filtering is supported only on partition 
> keys of type string) Caused by: MetaException(message:Filtering is supported 
> only on partition keys of type string) Caused by: 
> MetaException(message:Filtering is supported only on partition keys of type 
> string)
>  
> hadoop-cmf-hive-HIVEMETASTORE-isblcsmsttc0001.scisb.isban.corp.log.out.1:
>  
> 2020-01-10 09:36:05,781 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-138]: 
> MetaException(message:Filtering is supported only on partition keys of type 
> string)
> 2020-01-10 11:19:19,208 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-187]: 
> MetaException(message:Filtering is supported only on partition keys of type 
> string)
> 2020-01-10 11:19:54,780 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-167]: 
> MetaException(message:Filtering is supported only on partition keys of type 
> string)
>  
>  
> We know the best practice from Spark point of view is to use 'STRING' type 
> for partition columns, but we need to explore a solution we'll be able to 
> deploy with ease, due to the big amount of tables created with a bigiint type 
> column partition.
>  
> As a first solution we tried to set the  
> spark.sql.hive.manageFilesourcePartitions parameter to false in the Spark 
> Submmit, but after reruning the SW the error stood still.
>  
> Is there anyone in the community who experienced the same problem? What was 
> the solution for it? 
>  
> Kind Regards and thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841216#comment-17841216
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

I am calling OverrideAvro first and then AppendAvro

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841214#comment-17841214
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

import org.apache.spark.sql.*;

import java.util.List;

public class Writer {

    public static void writeAvro(List list, String path) {
        writeAvro(list, path, SaveMode.Overwrite);
    }

    public static void writeAvro(List list, String path, SaveMode saveMode) {

        Dataset dataset = getDatasetFromList(list);

        dataset.write().format("avro")
                .mode(saveMode)
                .save(path);
    }

    public static void writeAvro(Dataset ds, String path, SaveMode 
saveMode) {

        ds.write().format("avro")
                .mode(saveMode)
                .save(path);
    }

    public static Dataset getDatasetFromList(List list) {
        Class clazz = list.get(0).getClass();

        SparkSession spark = SparkSession.builder()
                .config("spark.master", "local")
                .getOrCreate();
        SQLContext context = spark.sqlContext();
        Dataset dataset = context.createDataset(list, 
Encoders.bean(clazz)).toDF();
        return dataset;
    }

}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841213#comment-17841213
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

import org.apache.spark.sql.SaveMode;
import org.example.avro.Writer;

import java.util.ArrayList;
import java.util.List;

public class OverrideAvro {

    public static void main(String[] args) {
            // C:\Users\kavarus\testing\spark-testing\data
        Writer.writeAvro(getMockData(), 
"C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Overwrite);
    }

    public static List getMockData() {
        List lst = new ArrayList<>();
        lst.add(new Modal("1", "Test1", 26));
        lst.add(new Modal("2", "Test2", 28));
        return lst;
    }

}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841211#comment-17841211
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

 

import org.apache.spark.sql.SaveMode;
import org.example.avro.Writer;

import java.util.ArrayList;
import java.util.List;

public class AppendAvro {

    public static void main(String[] args) {
        Writer.writeAvro(getMockData(), 
"C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Append);
    }

    public static List getMockData() {
        List lst = new ArrayList<>();
        lst.add(new Modal("3", "Test3", 27));
        lst.add(new Modal("4", "Test4", 27));
        return lst;
    }
}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841212#comment-17841212
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

 

public class Modal {
    public String id;
    public String name;
    public int age;

    public Modal(String id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }
}

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841210#comment-17841210
 ] 

Rushikesh Kavar commented on SPARK-48009:
-

I will attach the code within few hours

> Specifications for Apache Spark hadoop Avro append operation
> 
>
> Key: SPARK-48009
> URL: https://issues.apache.org/jira/browse/SPARK-48009
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: Rushikesh Kavar
>Priority: Minor
>
> Consider a path /a/b/c 
> Assume, I write the avro to folder using apache spark.
> After it is written, Assume I try to append dataset to this to folder. 
> I want to see the specification of what happens in case of append.
> After doing PoC, I found out that when dataet which is appended is having 
> same schema as of existing data, data gets just appended. But I want to see 
> clear docs of what happens exactly in case of append.
> I am attaching my testing java code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation

2024-04-26 Thread Rushikesh Kavar (Jira)
Rushikesh Kavar created SPARK-48009:
---

 Summary: Specifications for Apache Spark hadoop Avro append 
operation
 Key: SPARK-48009
 URL: https://issues.apache.org/jira/browse/SPARK-48009
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.3
Reporter: Rushikesh Kavar


Consider a path /a/b/c 

Assume, I write the avro to folder using apache spark.

After it is written, Assume I try to append dataset to this to folder. 

I want to see the specification of what happens in case of append.

After doing PoC, I found out that when dataet which is appended is having same 
schema as of existing data, data gets just appended. But I want to see clear 
docs of what happens exactly in case of append.

I am attaching my testing java code.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47351.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46165
[https://github.com/apache/spark/pull/46165]

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47351:
---

Assignee: Uroš Bojanić

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47350.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46158
[https://github.com/apache/spark/pull/46158]

> SplitPart (binary & lowercase collation only)
> -
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47408) Fix mathExpressions that use StringType

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47408:
---
Labels: pull-request-available  (was: )

> Fix mathExpressions that use StringType
> ---
>
> Key: SPARK-47408
> URL: https://issues.apache.org/jira/browse/SPARK-47408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48008) Support UDAF in Spark Connect

2024-04-26 Thread Pengfei Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengfei Xu updated SPARK-48008:
---
Description: 
Currently Spark Connect supports only UDFs. We need to add support for UDAFs, 
specifically `Aggregator[INT, BUF, OUT]`.

The user-facing API should not change, which includes Aggregator methods and 
the `spark.udf.register("agg", udaf(agg))` API.

  was:
Currently Spark Connect supports only UDFs. We need to add support for UDAFs, 
specifically `Aggregator[INT, BUF, OUT]`.

The user-facing API should not change, which includes Aggregator methods and 
the `
spark.udf.register("agg", udaf(agg))` API.


> Support UDAF in Spark Connect
> -
>
> Key: SPARK-48008
> URL: https://issues.apache.org/jira/browse/SPARK-48008
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Pengfei Xu
>Priority: Major
>
> Currently Spark Connect supports only UDFs. We need to add support for UDAFs, 
> specifically `Aggregator[INT, BUF, OUT]`.
> The user-facing API should not change, which includes Aggregator methods and 
> the `spark.udf.register("agg", udaf(agg))` API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48004) Add WriteFilesExecBase trait for v1 write

2024-04-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48004:


Assignee: XiDuo You

> Add WriteFilesExecBase trait for v1 write
> -
>
> Key: SPARK-48004
> URL: https://issues.apache.org/jira/browse/SPARK-48004
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48004) Add WriteFilesExecBase trait for v1 write

2024-04-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48004.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46240
[https://github.com/apache/spark/pull/46240]

> Add WriteFilesExecBase trait for v1 write
> -
>
> Key: SPARK-48004
> URL: https://issues.apache.org/jira/browse/SPARK-48004
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48007:
---
Labels: pull-request-available  (was: )

> MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
> ---
>
> Key: SPARK-48007
> URL: https://issues.apache.org/jira/browse/SPARK-48007
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48006) add SortOrder for window function which has no orderSpec

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48006:
---
Labels: pull-request-available  (was: )

> add SortOrder for window function which has no orderSpec
> 
>
> Key: SPARK-48006
> URL: https://issues.apache.org/jira/browse/SPARK-48006
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: guihuawen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I am doing Hive SQL to switch to Spark SQL.
>  
> In Hive SQL
>  
> hive> explain select *,row_number() over (partition by day) rn from 
> testdb.zeropart_db;
> OK
> Explain
>  
> In Spark SQL
> spark-sql> explain select *,row_number() over (partition by age ) rn  from 
> testdb.zeropart_db;
> plan
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Window function row_number() requires 
> window to be ordered, please add ORDER BY clause. For example SELECT 
> row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY 
> window_ordering) from table
> Time taken: 0.172 seconds, Fetched 1 row(s)
>  
> For better compatibility with migration. For better compatibility with 
> migration, new parameters are added to ensure compatibility with the same 
> behavior as Hive SQL
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11

2024-04-26 Thread Kent Yao (Jira)
Kent Yao created SPARK-48007:


 Summary: MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
 Key: SPARK-48007
 URL: https://issues.apache.org/jira/browse/SPARK-48007
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Tests
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48006) add SortOrder for window function which has no orderSpec

2024-04-26 Thread guihuawen (Jira)
guihuawen created SPARK-48006:
-

 Summary: add SortOrder for window function which has no orderSpec
 Key: SPARK-48006
 URL: https://issues.apache.org/jira/browse/SPARK-48006
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: guihuawen
 Fix For: 4.0.0


I am doing Hive SQL to switch to Spark SQL.

 

In Hive SQL

 

hive> explain select *,row_number() over (partition by day) rn from 
testdb.zeropart_db;

OK
Explain

 

In Spark SQL

spark-sql> explain select *,row_number() over (partition by age ) rn  from 
testdb.zeropart_db;

plan

== Physical Plan ==

org.apache.spark.sql.AnalysisException: Window function row_number() requires 
window to be ordered, please add ORDER BY clause. For example SELECT 
row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY 
window_ordering) from table

Time taken: 0.172 seconds, Fetched 1 row(s)

 

For better compatibility with migration. For better compatibility with 
migration, new parameters are added to ensure compatibility with the same 
behavior as Hive SQL

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46122:
--

Assignee: (was: Apache Spark)

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46122:
--

Assignee: Apache Spark

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48003) Hll sketch aggregate support for strings with collation

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48003:
--

Assignee: Apache Spark

> Hll sketch aggregate support for strings with collation
> ---
>
> Key: SPARK-48003
> URL: https://issues.apache.org/jira/browse/SPARK-48003
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48005:
---
Labels: pull-request-available  (was: )

> Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
> -
>
> Key: SPARK-48005
> URL: https://issues.apache.org/jira/browse/SPARK-48005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`

2024-04-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48005:
-

 Summary: Enable `DefaultIndexParityTests. 
test_index_distributed_sequence_cleanup`
 Key: SPARK-48005
 URL: https://issues.apache.org/jira/browse/SPARK-48005
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48003) Hll sketch aggregate support for strings with collation

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48003:
---
Labels: pull-request-available  (was: )

> Hll sketch aggregate support for strings with collation
> ---
>
> Key: SPARK-48003
> URL: https://issues.apache.org/jira/browse/SPARK-48003
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47993) Drop Python 3.8 support

2024-04-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47993.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46228
[https://github.com/apache/spark/pull/46228]

> Drop Python 3.8 support
> ---
>
> Key: SPARK-47993
> URL: https://issues.apache.org/jira/browse/SPARK-47993
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available, release-notes
> Fix For: 4.0.0
>
>
> Python 3.8 is EOL in this October. Considering the release schedule, we 
> should better drop it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48004) Add WriteFilesExecBase trait for v1 write

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48004:
---
Labels: pull-request-available  (was: )

> Add WriteFilesExecBase trait for v1 write
> -
>
> Key: SPARK-48004
> URL: https://issues.apache.org/jira/browse/SPARK-48004
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48004) Add WriteFilesExecBase trait for v1 write

2024-04-26 Thread XiDuo You (Jira)
XiDuo You created SPARK-48004:
-

 Summary: Add WriteFilesExecBase trait for v1 write
 Key: SPARK-48004
 URL: https://issues.apache.org/jira/browse/SPARK-48004
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 4.0.0
Reporter: XiDuo You






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48003) Hll sketch aggregate support for strings with collation

2024-04-26 Thread Jira
Uroš Bojanić created SPARK-48003:


 Summary: Hll sketch aggregate support for strings with collation
 Key: SPARK-48003
 URL: https://issues.apache.org/jira/browse/SPARK-48003
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`

2024-04-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48001.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46238
[https://github.com/apache/spark/pull/46238]

> Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
> -
>
> Key: SPARK-48001
> URL: https://issues.apache.org/jira/browse/SPARK-48001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`

2024-04-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48001:


Assignee: Yang Jie

> Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
> -
>
> Key: SPARK-48001
> URL: https://issues.apache.org/jira/browse/SPARK-48001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47968) MsSQLServer: Map datatimeoffset to TimestampType

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47968:
---
Labels: pull-request-available  (was: )

> MsSQLServer: Map datatimeoffset to TimestampType
> 
>
> Key: SPARK-47968
> URL: https://issues.apache.org/jira/browse/SPARK-47968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server

2024-04-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47986.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46221
[https://github.com/apache/spark/pull/46221]

> [CONNECT][PYTHON] Unable to create a new session when the default session is 
> closed by the server
> -
>
> Key: SPARK-47986
> URL: https://issues.apache.org/jira/browse/SPARK-47986
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When the server closes a session, usually after a cluster restart, the client 
> is unaware of this until it receives an error.
> Once it does so, there is no way for the client to create a new session since 
> the stale sessions are still recorded as default and active sessions.
> The only solution currently is to restart the Python interpreter on the 
> client, or to reach into the session builder and change the active or default 
> session.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48001:
---
Labels: pull-request-available  (was: )

> Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
> -
>
> Key: SPARK-48001
> URL: https://issues.apache.org/jira/browse/SPARK-48001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners

2024-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48002:
---
Labels: pull-request-available  (was: )

> Add Observed metrics test in PySpark StreamingQueryListeners
> 
>
> Key: SPARK-48002
> URL: https://issues.apache.org/jira/browse/SPARK-48002
> Project: Spark
>  Issue Type: New Feature
>  Components: SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners

2024-04-26 Thread Wei Liu (Jira)
Wei Liu created SPARK-48002:
---

 Summary: Add Observed metrics test in PySpark 
StreamingQueryListeners
 Key: SPARK-48002
 URL: https://issues.apache.org/jira/browse/SPARK-48002
 Project: Spark
  Issue Type: New Feature
  Components: SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`

2024-04-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-48001:


 Summary: Remove unused `private implicit def arrayToArrayWritable` 
from `SparkContext`
 Key: SPARK-48001
 URL: https://issues.apache.org/jira/browse/SPARK-48001
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47922) Implement try_parse_json

2024-04-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47922.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46141
[https://github.com/apache/spark/pull/46141]

> Implement try_parse_json
> 
>
> Key: SPARK-47922
> URL: https://issues.apache.org/jira/browse/SPARK-47922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement try_parse_json expression that runs parse_json on valid string 
> inputs and returns null when the input string is malformed. Note that this 
> expression also only supports string input types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47440) SQLServer does not support LIKE operator in binary comparison

2024-04-26 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47440:
-
Parent: SPARK-47361
Issue Type: Sub-task  (was: Bug)

> SQLServer does not support LIKE operator in binary comparison
> -
>
> Key: SPARK-47440
> URL: https://issues.apache.org/jira/browse/SPARK-47440
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> When pushing Spark query to MsSqlServer engine we sometimes construct SQL 
> query that has a LIKE operator as a part of the binary comparison operation, 
> which is not permitted in SQL Server syntax. 
> For example a query 
> {code:java}
> SELECT * FROM people WHERE (name LIKE "s%") = 1{code}
> will not execute on MsSQLServer.
> These queries should be detected and not pushed to execution in MsSqlServer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org