[jira] [Resolved] (SPARK-47408) Fix mathExpressions that use StringType
[ https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47408. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46227 [https://github.com/apache/spark/pull/46227] > Fix mathExpressions that use StringType > --- > > Key: SPARK-47408 > URL: https://issues.apache.org/jira/browse/SPARK-47408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gene Pang updated SPARK-48019: -- Description: {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the {{null}} entries of the vector. Sometimes, this can cause an {{ArrayIndexOutOfBoundsException}} . In addition to the possible Exception, copying a {{ColumnarArray}} is not correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain {{null}} values. However, the {{copy()}} for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. was: `ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, `ColumnVectors` can have `null`s, and for those `null` entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the `null` entries of the vector. Sometimes, this can cause an `ArrayIndexOutOfBoundsException` . In addition to the possible Exception, copying a `ColumnarArray` is not correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` values. However, the `copy()` for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Priority: Major > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
Gene Pang created SPARK-48019: - Summary: ColumnVectors with dictionaries and nulls are not read/copied correctly Key: SPARK-48019 URL: https://issues.apache.org/jira/browse/SPARK-48019 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3 Reporter: Gene Pang `ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, `ColumnVectors` can have `null`s, and for those `null` entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the `null` entries of the vector. Sometimes, this can cause an `ArrayIndexOutOfBoundsException` . In addition to the possible Exception, copying a `ColumnarArray` is not correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` values. However, the `copy()` for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[ https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-48018: Assignee: B. Micheal Okutubo > Null groupId causing missing param error when throwing > KafkaException.couldNotReadOffsetRange > - > > Key: SPARK-48018 > URL: https://issues.apache.org/jira/browse/SPARK-48018 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: B. Micheal Okutubo >Assignee: B. Micheal Okutubo >Priority: Major > Labels: pull-request-available > > [INTERNAL_ERROR] Undefined error message parameter for error class: > 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' > when groupId is null when we are about to throw > KafkaException.couldNotReadOffsetRange error. > The error framework requires all params to be non-null. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[ https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-48018. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46253 [https://github.com/apache/spark/pull/46253] > Null groupId causing missing param error when throwing > KafkaException.couldNotReadOffsetRange > - > > Key: SPARK-48018 > URL: https://issues.apache.org/jira/browse/SPARK-48018 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: B. Micheal Okutubo >Assignee: B. Micheal Okutubo >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [INTERNAL_ERROR] Undefined error message parameter for error class: > 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' > when groupId is null when we are about to throw > KafkaException.couldNotReadOffsetRange error. > The error framework requires all params to be non-null. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48017) Add Spark application submission worker for operator
[ https://issues.apache.org/jira/browse/SPARK-48017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48017: --- Labels: pull-request-available (was: ) > Add Spark application submission worker for operator > > > Key: SPARK-48017 > URL: https://issues.apache.org/jira/browse/SPARK-48017 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Priority: Major > Labels: pull-request-available > > Spark Operator needs a submission worker that converts it's application > abstraction (Operator API) to k8s resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[ https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] B. Micheal Okutubo updated SPARK-48018: --- Description: [INTERNAL_ERROR] Undefined error message parameter for error class: 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' when groupId is null when we are about to throw KafkaException.couldNotReadOffsetRange error. The error framework requires all params to be non-null. was: [INTERNAL_ERROR] Undefined error message parameter for error class: 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' when groupId is null when we are about to throw KafkaException.couldNotReadOffsetRange error > Null groupId causing missing param error when throwing > KafkaException.couldNotReadOffsetRange > - > > Key: SPARK-48018 > URL: https://issues.apache.org/jira/browse/SPARK-48018 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: B. Micheal Okutubo >Priority: Major > Labels: pull-request-available > > [INTERNAL_ERROR] Undefined error message parameter for error class: > 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' > when groupId is null when we are about to throw > KafkaException.couldNotReadOffsetRange error. > The error framework requires all params to be non-null. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[ https://issues.apache.org/jira/browse/SPARK-48018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48018: --- Labels: pull-request-available (was: ) > Null groupId causing missing param error when throwing > KafkaException.couldNotReadOffsetRange > - > > Key: SPARK-48018 > URL: https://issues.apache.org/jira/browse/SPARK-48018 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: B. Micheal Okutubo >Priority: Major > Labels: pull-request-available > > [INTERNAL_ERROR] Undefined error message parameter for error class: > 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' > when groupId is null when we are about to throw > KafkaException.couldNotReadOffsetRange error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48018) Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
B. Micheal Okutubo created SPARK-48018: -- Summary: Null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange Key: SPARK-48018 URL: https://issues.apache.org/jira/browse/SPARK-48018 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: B. Micheal Okutubo [INTERNAL_ERROR] Undefined error message parameter for error class: 'KAFKA_DATA_LOSS.COULD_NOT_READ_OFFSET_RANGE' when groupId is null when we are about to throw KafkaException.couldNotReadOffsetRange error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48017) Add Spark application submission worker for operator
Zhou JIANG created SPARK-48017: -- Summary: Add Spark application submission worker for operator Key: SPARK-48017 URL: https://issues.apache.org/jira/browse/SPARK-48017 Project: Spark Issue Type: Sub-task Components: k8s Affects Versions: kubernetes-operator-0.1.0 Reporter: Zhou JIANG Spark Operator needs a submission worker that converts it's application abstraction (Operator API) to k8s resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48016) Binary Arithmetic operators should include the evalMode when makeCopy
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48016: --- Labels: pull-request-available (was: ) > Binary Arithmetic operators should include the evalMode when makeCopy > - > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48016) Binary Arithmetic operators should include the evalMode when makeCopy
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-48016: --- Summary: Binary Arithmetic operators should include the evalMode when makeCopy (was: Binary Arithmetic operators should include the evalMode during makeCopy) > Binary Arithmetic operators should include the evalMode when makeCopy > - > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48016) Binary Arithmetic operators should include the evalMode during makeCopy
Gengliang Wang created SPARK-48016: -- Summary: Binary Arithmetic operators should include the evalMode during makeCopy Key: SPARK-48016 URL: https://issues.apache.org/jira/browse/SPARK-48016 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0, 3.5.2 Reporter: Gengliang Wang Assignee: Gengliang Wang Binary Arithmetic operators should include the evalMode during makeCopy. Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of returning null {code:java} SELECT try_divide(1, decimal(0)); {code} This is caused from the rule DecimalPrecision: {code:java} case b @ BinaryOperator(left, right) if left.dataType != right.dataType => (left, right) match { ... case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && l.dataType.isInstanceOf[IntegralType] && literalPickMinimumPrecision => b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47696) try_to_timestamp should handle SparkUpgradeException
[ https://issues.apache.org/jira/browse/SPARK-47696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47696. Resolution: Won't Fix > try_to_timestamp should handle SparkUpgradeException > > > Key: SPARK-47696 > URL: https://issues.apache.org/jira/browse/SPARK-47696 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2, 3.4.3 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Currently, try_to_timestamp will throw an exception on legacy timestamp input. > {code:java} > > SELECT try_to_timestamp('2016-12-1', '-MM-dd') > org.apache.spark.SparkUpgradeException: > [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may > get a different result due to the upgrading to Spark >= 3.0: > Fail to parse '2016-12-1' in the new parser. > You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the > behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid > datetime string. SQLSTATE: 42K0B {code} > It should return null instead of error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47943: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48015: - Assignee: Dongjoon Hyun > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48015. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9 [https://github.com/apache/spark-kubernetes-operator/pull/9] > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47929) Setup Static Analysis for Operator
[ https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47929: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Setup Static Analysis for Operator > -- > > Key: SPARK-47929 > URL: https://issues.apache.org/jira/browse/SPARK-47929 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Add common analysis tasks including checkstyle, spotbugs, jacoco. Also > include spotless for style fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47950) Add Java API Module for Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47950: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Add Java API Module for Spark Operator > -- > > Key: SPARK-47950 > URL: https://issues.apache.org/jira/browse/SPARK-47950 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Spark Operator API refers to the > [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/] > __ that represents the spec for Spark Application in k8s. > This aims to add Java API library for Spark Operator, with the ability to > generate yaml spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48015: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48015: --- Labels: pull-request-available (was: ) > Update `build.gradle` to fix deprecation warnings > - > > Key: SPARK-48015 > URL: https://issues.apache.org/jira/browse/SPARK-48015 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48015) Update `build.gradle` to fix deprecation warnings
Dongjoon Hyun created SPARK-48015: - Summary: Update `build.gradle` to fix deprecation warnings Key: SPARK-48015 URL: https://issues.apache.org/jira/browse/SPARK-48015 Project: Spark Issue Type: Sub-task Components: Build, Kubernetes Affects Versions: kubernetes-operator-0.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47950) Add Java API Module for Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47950: - Assignee: Zhou JIANG > Add Java API Module for Spark Operator > -- > > Key: SPARK-47950 > URL: https://issues.apache.org/jira/browse/SPARK-47950 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > > Spark Operator API refers to the > [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/] > __ that represents the spec for Spark Application in k8s. > This aims to add Java API library for Spark Operator, with the ability to > generate yaml spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47950) Add Java API Module for Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47950. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 8 [https://github.com/apache/spark-kubernetes-operator/pull/8] > Add Java API Module for Spark Operator > -- > > Key: SPARK-47950 > URL: https://issues.apache.org/jira/browse/SPARK-47950 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Operator API refers to the > [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/] > __ that represents the spec for Spark Application in k8s. > This aims to add Java API library for Spark Operator, with the ability to > generate yaml spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error
[ https://issues.apache.org/jira/browse/SPARK-48014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48014: --- Labels: pull-request-available (was: ) > Change the makeFromJava error in EvaluatePython to a user-facing error > -- > > Key: SPARK-48014 > URL: https://issues.apache.org/jira/browse/SPARK-48014 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48014) Change the makeFromJava error in EvaluatePython to a user-facing error
Allison Wang created SPARK-48014: Summary: Change the makeFromJava error in EvaluatePython to a user-facing error Key: SPARK-48014 URL: https://issues.apache.org/jira/browse/SPARK-48014 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Allison Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841355#comment-17841355 ] Tatu Saloranta commented on SPARK-47959: Aside from question of reducing contention in InternCache, my experience has been that if this blocking is hit there is always some other problem involved: either unbounded number of keys (like UUID keys) or lack of `JsonFactory` reuse. In latter case the best solution is to try to use JsonFactory (whether directly or by reusing ObjectMapper that owns it); in former case (or, as 2nd alternative for latter case), there are 2 `JsonFactory.Feature` settings that may be disabled: * JsonFactory.Feature.INTERN_FIELD_NAMES: if names are not reused across reads, there is little value in String.intern() * JsonFactory.Feature.CANONICALIZE_FIELD_NAMES: ... or if there's no reuse nor repeating symbols, the whole canonicalization can be disabled. and so it may be worth experimenting with these settings (disabling one or the other: if CANONICALIZE_FIELD_NAMES disabled INTERN_FIELD_NAMES does not matter). Put another way: while there is some value in improving locking of `InternCache`, it is unlikely to be the most effective solution to whatever problem there is. > Improve GET_JSON_OBJECT performance on executors running multiple tasks > --- > > Key: SPARK-47959 > URL: https://issues.apache.org/jira/browse/SPARK-47959 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Zheng Shao >Priority: Major > > We have a Spark executor that is running 32 workers in parallel. The query > is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. > We noticed that 80+% of the stacktrace of the worker threads are blocked on > the following stacktrace: > > {code:java} > com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - > blocked on java.lang.Object@7529fde1 > com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) > > com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown > Source) > ... > {code} > > Apparently jackson-core has such a performance bug from version 2.3 - 2.15, > and not fixed until version 2.18 (unreleased): > [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] > > {code:java} > synchronized (lock) { > if (size() >= MAX_ENTRIES) { > clear(); > } > } > {code} > > instead of > [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] > > {code:java} > /* As of 2.18, the limit is not strictly enforced, but we do try > to > * clear entries if we have reached the limit. We do not expect to > * go too much over the limit, and if we do, it's not a huge > problem. > * If some other thread has the lock, we will not clear but the > lock should > * not be held for long, so another thread should be able to > clear in the near future. > */ > if (lock.tryLock()) { > try { > if (size() >= DEFAULT_MAX_ENTRIES) { > clear(); > } > } finally { > lock.unlock(); > } > } {code} > > Potential fixes: > # Upgrade to Jackson-core 2.18 when it's released; > # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I
[jira] [Commented] (SPARK-47219) XML: Ignore commented row tags in XML tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841353#comment-17841353 ] Sandip Agarwala commented on SPARK-47219: - Correct. Thanks for pointing it out. I closed it as duplicate. > XML: Ignore commented row tags in XML tokenizer > --- > > Key: SPARK-47219 > URL: https://issues.apache.org/jira/browse/SPARK-47219 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47219) XML: Ignore commented row tags in XML tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandip Agarwala resolved SPARK-47219. - Resolution: Duplicate > XML: Ignore commented row tags in XML tokenizer > --- > > Key: SPARK-47219 > URL: https://issues.apache.org/jira/browse/SPARK-47219 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47219) XML: Ignore commented row tags in XML tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841351#comment-17841351 ] HiuFung Kwok commented on SPARK-47219: -- [~sandip.agarwala] Do we need this task, it seems to be duplicate of SPARK-47218 . > XML: Ignore commented row tags in XML tokenizer > --- > > Key: SPARK-47219 > URL: https://issues.apache.org/jira/browse/SPARK-47219 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42846) Assign a name to the error class _LEGACY_ERROR_TEMP_2011
[ https://issues.apache.org/jira/browse/SPARK-42846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841347#comment-17841347 ] HiuFung Kwok commented on SPARK-42846: -- Hi [~maxgekk] , I have submitted an MR for this, would you mind having a look? Thx. > Assign a name to the error class _LEGACY_ERROR_TEMP_2011 > > > Key: SPARK-42846 > URL: https://issues.apache.org/jira/browse/SPARK-42846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48012) SPJ: Support Transfrom Expressions for One Side Shuffle
[ https://issues.apache.org/jira/browse/SPARK-48012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated SPARK-48012: -- Parent: SPARK-37375 Issue Type: Sub-task (was: New Feature) > SPJ: Support Transfrom Expressions for One Side Shuffle > --- > > Key: SPARK-48012 > URL: https://issues.apache.org/jira/browse/SPARK-48012 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.3 >Reporter: Szehon Ho >Priority: Major > > SPARK-41471 allowed Spark to shuffle just one side and still conduct SPJ, if > the other side is KeyGroupedPartitioning. However, the support was just for > a KeyGroupedPartition without any partition transform (day, year, bucket). > It will be useful to add support for partition transform as well, as there > are many tables partitioned by those transforms. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48012) SPJ: Support Transfrom Expressions for One Side Shuffle
Szehon Ho created SPARK-48012: - Summary: SPJ: Support Transfrom Expressions for One Side Shuffle Key: SPARK-48012 URL: https://issues.apache.org/jira/browse/SPARK-48012 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.3 Reporter: Szehon Ho SPARK-41471 allowed Spark to shuffle just one side and still conduct SPJ, if the other side is KeyGroupedPartitioning. However, the support was just for a KeyGroupedPartition without any partition transform (day, year, bucket). It will be useful to add support for partition transform as well, as there are many tables partitioned by those transforms. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841339#comment-17841339 ] PJ Fanning commented on SPARK-47959: [~zshao] if you have a test environment, could you try it with the 2.18.0-SNAPSHOT Jackson jars to see if they halp? > Improve GET_JSON_OBJECT performance on executors running multiple tasks > --- > > Key: SPARK-47959 > URL: https://issues.apache.org/jira/browse/SPARK-47959 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Zheng Shao >Priority: Major > > We have a Spark executor that is running 32 workers in parallel. The query > is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. > We noticed that 80+% of the stacktrace of the worker threads are blocked on > the following stacktrace: > > {code:java} > com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - > blocked on java.lang.Object@7529fde1 > com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) > > com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown > Source) > ... > {code} > > Apparently jackson-core has such a performance bug from version 2.3 - 2.15, > and not fixed until version 2.18 (unreleased): > [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] > > {code:java} > synchronized (lock) { > if (size() >= MAX_ENTRIES) { > clear(); > } > } > {code} > > instead of > [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] > > {code:java} > /* As of 2.18, the limit is not strictly enforced, but we do try > to > * clear entries if we have reached the limit. We do not expect to > * go too much over the limit, and if we do, it's not a huge > problem. > * If some other thread has the lock, we will not clear but the > lock should > * not be held for long, so another thread should be able to > clear in the near future. > */ > if (lock.tryLock()) { > try { > if (size() >= DEFAULT_MAX_ENTRIES) { > clear(); > } > } finally { > lock.unlock(); > } > } {code} > > Potential fixes: > # Upgrade to Jackson-core 2.18 when it's released; > # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I don't > totally understand the options suggested by this thread yet. > # Introduce a new UDF that doesn't depend on jackson-core -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression
[ https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48010. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46248 [https://github.com/apache/spark/pull/46248] > Avoid repeated calls to conf.resolver in resolveExpression > -- > > Key: SPARK-48010 > URL: https://issues.apache.org/jira/browse/SPARK-48010 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Nikhil Sheoran >Assignee: Nikhil Sheoran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Consider a view with a large number of columns (~1000s). When resolving this > view, looking at the flamegraph, observed repeated initializations of `conf` > to obtain the `resolver` for each column of the view. > This can be easily optimized to reuse the same resolver (obtained once) for > the various calls to `innerResolve` in `resolveExpression`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48011) Store LogKey name as a value to avoid generating new string instances
Gengliang Wang created SPARK-48011: -- Summary: Store LogKey name as a value to avoid generating new string instances Key: SPARK-48011 URL: https://issues.apache.org/jira/browse/SPARK-48011 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47963) Make the external Spark ecosystem can use structured logging mechanisms
[ https://issues.apache.org/jira/browse/SPARK-47963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47963. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46193 [https://github.com/apache/spark/pull/46193] > Make the external Spark ecosystem can use structured logging mechanisms > > > Key: SPARK-47963 > URL: https://issues.apache.org/jira/browse/SPARK-47963 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46122: - Assignee: Dongjoon Hyun > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default > - > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
[ https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48005. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46242 [https://github.com/apache/spark/pull/46242] > Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` > - > > Key: SPARK-48005 > URL: https://issues.apache.org/jira/browse/SPARK-48005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression
[ https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48010: --- Labels: pull-request-available (was: ) > Avoid repeated calls to conf.resolver in resolveExpression > -- > > Key: SPARK-48010 > URL: https://issues.apache.org/jira/browse/SPARK-48010 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Nikhil Sheoran >Priority: Major > Labels: pull-request-available > > Consider a view with a large number of columns (~1000s). When resolving this > view, looking at the flamegraph, observed repeated initializations of `conf` > to obtain the `resolver` for each column of the view. > This can be easily optimized to reuse the same resolver (obtained once) for > the various calls to `innerResolve` in `resolveExpression`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression
Nikhil Sheoran created SPARK-48010: -- Summary: Avoid repeated calls to conf.resolver in resolveExpression Key: SPARK-48010 URL: https://issues.apache.org/jira/browse/SPARK-48010 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.3 Reporter: Nikhil Sheoran Consider a view with a large number of columns (~1000s). When resolving this view, looking at the flamegraph, observed repeated initializations of `conf` to obtain the `resolver` for each column of the view. This can be easily optimized to reuse the same resolver (obtained once) for the various calls to `innerResolve` in `resolveExpression`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-47959: --- Description: We have a Spark executor that is running 32 workers in parallel. The query is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. We noticed that 80+% of the stacktrace of the worker threads are blocked on the following stacktrace: {code:java} com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - blocked on java.lang.Object@7529fde1 com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown Source) ... {code} Apparently jackson-core has such a performance bug from version 2.3 - 2.15, and not fixed until version 2.18 (unreleased): [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] {code:java} synchronized (lock) { if (size() >= MAX_ENTRIES) { clear(); } } {code} instead of [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] {code:java} /* As of 2.18, the limit is not strictly enforced, but we do try to * clear entries if we have reached the limit. We do not expect to * go too much over the limit, and if we do, it's not a huge problem. * If some other thread has the lock, we will not clear but the lock should * not be held for long, so another thread should be able to clear in the near future. */ if (lock.tryLock()) { try { if (size() >= DEFAULT_MAX_ENTRIES) { clear(); } } finally { lock.unlock(); } } {code} Potential fixes: # Upgrade to Jackson-core 2.18 when it's released; # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I don't totally understand the options suggested by this thread yet. # Introduce a new UDF that doesn't depend on jackson-core was: We have a Spark executor that is running 32 workers in parallel. The query is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. We noticed that 80+% of the stacktrace of the worker threads are blocked on the following stacktrace: {code:java} com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - blocked on java.lang.Object@7529fde1 com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown
[jira] [Resolved] (SPARK-47968) MsSQLServer: Map datatimeoffset to TimestampType
[ https://issues.apache.org/jira/browse/SPARK-47968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47968. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46239 [https://github.com/apache/spark/pull/46239] > MsSQLServer: Map datatimeoffset to TimestampType > > > Key: SPARK-47968 > URL: https://issues.apache.org/jira/browse/SPARK-47968 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47476: --- Assignee: Uroš Bojanić > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47476. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45704 [https://github.com/apache/spark/pull/45704] > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
[ https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48007. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46244 [https://github.com/apache/spark/pull/46244] > MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11 > --- > > Key: SPARK-48007 > URL: https://issues.apache.org/jira/browse/SPARK-48007 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-30709) Spark 2.3 to Spark 2.4 Upgrade. Problems reading HIVE partitioned tables.
[ https://issues.apache.org/jira/browse/SPARK-30709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen deleted SPARK-30709: - > Spark 2.3 to Spark 2.4 Upgrade. Problems reading HIVE partitioned tables. > - > > Key: SPARK-30709 > URL: https://issues.apache.org/jira/browse/SPARK-30709 > Project: Spark > Issue Type: Question > Environment: PRE- Production >Reporter: Carlos Mario >Priority: Major > Labels: SQL, Spark > > Hello > We recently updated our preproduction environment from Spark 2.3 to Spark > 2.4.0 > Along time we have created a big amount of tables in Hive Metastore, > partitioned by 2 fields one of them String and the other one BigInt. > We were reading this tables with Spark 2.3 with no problem, but after > upgrading to Spark 2.4 we get the following log every time we run our SW: > > log_filterBIGINT.out: > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) Caused by: MetaException(message:Filtering is supported > only on partition keys of type string) Caused by: > MetaException(message:Filtering is supported only on partition keys of type > string) > > hadoop-cmf-hive-HIVEMETASTORE-isblcsmsttc0001.scisb.isban.corp.log.out.1: > > 2020-01-10 09:36:05,781 ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-138]: > MetaException(message:Filtering is supported only on partition keys of type > string) > 2020-01-10 11:19:19,208 ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-187]: > MetaException(message:Filtering is supported only on partition keys of type > string) > 2020-01-10 11:19:54,780 ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-167]: > MetaException(message:Filtering is supported only on partition keys of type > string) > > > We know the best practice from Spark point of view is to use 'STRING' type > for partition columns, but we need to explore a solution we'll be able to > deploy with ease, due to the big amount of tables created with a bigiint type > column partition. > > As a first solution we tried to set the > spark.sql.hive.manageFilesourcePartitions parameter to false in the Spark > Submmit, but after reruning the SW the error stood still. > > Is there anyone in the community who experienced the same problem? What was > the solution for it? > > Kind Regards and thanks in advance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841216#comment-17841216 ] Rushikesh Kavar commented on SPARK-48009: - I am calling OverrideAvro first and then AppendAvro > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841214#comment-17841214 ] Rushikesh Kavar commented on SPARK-48009: - import org.apache.spark.sql.*; import java.util.List; public class Writer { public static void writeAvro(List list, String path) { writeAvro(list, path, SaveMode.Overwrite); } public static void writeAvro(List list, String path, SaveMode saveMode) { Dataset dataset = getDatasetFromList(list); dataset.write().format("avro") .mode(saveMode) .save(path); } public static void writeAvro(Dataset ds, String path, SaveMode saveMode) { ds.write().format("avro") .mode(saveMode) .save(path); } public static Dataset getDatasetFromList(List list) { Class clazz = list.get(0).getClass(); SparkSession spark = SparkSession.builder() .config("spark.master", "local") .getOrCreate(); SQLContext context = spark.sqlContext(); Dataset dataset = context.createDataset(list, Encoders.bean(clazz)).toDF(); return dataset; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841213#comment-17841213 ] Rushikesh Kavar commented on SPARK-48009: - import org.apache.spark.sql.SaveMode; import org.example.avro.Writer; import java.util.ArrayList; import java.util.List; public class OverrideAvro { public static void main(String[] args) { // C:\Users\kavarus\testing\spark-testing\data Writer.writeAvro(getMockData(), "C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Overwrite); } public static List getMockData() { List lst = new ArrayList<>(); lst.add(new Modal("1", "Test1", 26)); lst.add(new Modal("2", "Test2", 28)); return lst; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841211#comment-17841211 ] Rushikesh Kavar commented on SPARK-48009: - import org.apache.spark.sql.SaveMode; import org.example.avro.Writer; import java.util.ArrayList; import java.util.List; public class AppendAvro { public static void main(String[] args) { Writer.writeAvro(getMockData(), "C:\\Users\\kavarus\\testing\\spark-testing\\data", SaveMode.Append); } public static List getMockData() { List lst = new ArrayList<>(); lst.add(new Modal("3", "Test3", 27)); lst.add(new Modal("4", "Test4", 27)); return lst; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841212#comment-17841212 ] Rushikesh Kavar commented on SPARK-48009: - public class Modal { public String id; public String name; public int age; public Modal(String id, String name, int age) { this.id = id; this.name = name; this.age = age; } public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } } > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
[ https://issues.apache.org/jira/browse/SPARK-48009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841210#comment-17841210 ] Rushikesh Kavar commented on SPARK-48009: - I will attach the code within few hours > Specifications for Apache Spark hadoop Avro append operation > > > Key: SPARK-48009 > URL: https://issues.apache.org/jira/browse/SPARK-48009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Rushikesh Kavar >Priority: Minor > > Consider a path /a/b/c > Assume, I write the avro to folder using apache spark. > After it is written, Assume I try to append dataset to this to folder. > I want to see the specification of what happens in case of append. > After doing PoC, I found out that when dataet which is appended is having > same schema as of existing data, data gets just appended. But I want to see > clear docs of what happens exactly in case of append. > I am attaching my testing java code. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48009) Specifications for Apache Spark hadoop Avro append operation
Rushikesh Kavar created SPARK-48009: --- Summary: Specifications for Apache Spark hadoop Avro append operation Key: SPARK-48009 URL: https://issues.apache.org/jira/browse/SPARK-48009 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.3 Reporter: Rushikesh Kavar Consider a path /a/b/c Assume, I write the avro to folder using apache spark. After it is written, Assume I try to append dataset to this to folder. I want to see the specification of what happens in case of append. After doing PoC, I found out that when dataet which is appended is having same schema as of existing data, data gets just appended. But I want to see clear docs of what happens exactly in case of append. I am attaching my testing java code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47351. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46165 [https://github.com/apache/spark/pull/46165] > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47351: --- Assignee: Uroš Bojanić > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47350) SplitPart (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47350. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46158 [https://github.com/apache/spark/pull/46158] > SplitPart (binary & lowercase collation only) > - > > Key: SPARK-47350 > URL: https://issues.apache.org/jira/browse/SPARK-47350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47408) Fix mathExpressions that use StringType
[ https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47408: --- Labels: pull-request-available (was: ) > Fix mathExpressions that use StringType > --- > > Key: SPARK-47408 > URL: https://issues.apache.org/jira/browse/SPARK-47408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48008) Support UDAF in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengfei Xu updated SPARK-48008: --- Description: Currently Spark Connect supports only UDFs. We need to add support for UDAFs, specifically `Aggregator[INT, BUF, OUT]`. The user-facing API should not change, which includes Aggregator methods and the `spark.udf.register("agg", udaf(agg))` API. was: Currently Spark Connect supports only UDFs. We need to add support for UDAFs, specifically `Aggregator[INT, BUF, OUT]`. The user-facing API should not change, which includes Aggregator methods and the ` spark.udf.register("agg", udaf(agg))` API. > Support UDAF in Spark Connect > - > > Key: SPARK-48008 > URL: https://issues.apache.org/jira/browse/SPARK-48008 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Pengfei Xu >Priority: Major > > Currently Spark Connect supports only UDFs. We need to add support for UDAFs, > specifically `Aggregator[INT, BUF, OUT]`. > The user-facing API should not change, which includes Aggregator methods and > the `spark.udf.register("agg", udaf(agg))` API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48004) Add WriteFilesExecBase trait for v1 write
[ https://issues.apache.org/jira/browse/SPARK-48004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48004: Assignee: XiDuo You > Add WriteFilesExecBase trait for v1 write > - > > Key: SPARK-48004 > URL: https://issues.apache.org/jira/browse/SPARK-48004 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48004) Add WriteFilesExecBase trait for v1 write
[ https://issues.apache.org/jira/browse/SPARK-48004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48004. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46240 [https://github.com/apache/spark/pull/46240] > Add WriteFilesExecBase trait for v1 write > - > > Key: SPARK-48004 > URL: https://issues.apache.org/jira/browse/SPARK-48004 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
[ https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48007: --- Labels: pull-request-available (was: ) > MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11 > --- > > Key: SPARK-48007 > URL: https://issues.apache.org/jira/browse/SPARK-48007 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48006) add SortOrder for window function which has no orderSpec
[ https://issues.apache.org/jira/browse/SPARK-48006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48006: --- Labels: pull-request-available (was: ) > add SortOrder for window function which has no orderSpec > > > Key: SPARK-48006 > URL: https://issues.apache.org/jira/browse/SPARK-48006 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > I am doing Hive SQL to switch to Spark SQL. > > In Hive SQL > > hive> explain select *,row_number() over (partition by day) rn from > testdb.zeropart_db; > OK > Explain > > In Spark SQL > spark-sql> explain select *,row_number() over (partition by age ) rn from > testdb.zeropart_db; > plan > == Physical Plan == > org.apache.spark.sql.AnalysisException: Window function row_number() requires > window to be ordered, please add ORDER BY clause. For example SELECT > row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY > window_ordering) from table > Time taken: 0.172 seconds, Fetched 1 row(s) > > For better compatibility with migration. For better compatibility with > migration, new parameters are added to ensure compatibility with the same > behavior as Hive SQL > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
Kent Yao created SPARK-48007: Summary: MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11 Key: SPARK-48007 URL: https://issues.apache.org/jira/browse/SPARK-48007 Project: Spark Issue Type: Sub-task Components: Build, Tests Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48006) add SortOrder for window function which has no orderSpec
guihuawen created SPARK-48006: - Summary: add SortOrder for window function which has no orderSpec Key: SPARK-48006 URL: https://issues.apache.org/jira/browse/SPARK-48006 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: guihuawen Fix For: 4.0.0 I am doing Hive SQL to switch to Spark SQL. In Hive SQL hive> explain select *,row_number() over (partition by day) rn from testdb.zeropart_db; OK Explain In Spark SQL spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db; plan == Physical Plan == org.apache.spark.sql.AnalysisException: Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table Time taken: 0.172 seconds, Fetched 1 row(s) For better compatibility with migration. For better compatibility with migration, new parameters are added to ensure compatibility with the same behavior as Hive SQL -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46122: -- Assignee: (was: Apache Spark) > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default > - > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
[ https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46122: -- Assignee: Apache Spark > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default > - > > Key: SPARK-46122 > URL: https://issues.apache.org/jira/browse/SPARK-46122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48003) Hll sketch aggregate support for strings with collation
[ https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48003: -- Assignee: Apache Spark > Hll sketch aggregate support for strings with collation > --- > > Key: SPARK-48003 > URL: https://issues.apache.org/jira/browse/SPARK-48003 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
[ https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48005: --- Labels: pull-request-available (was: ) > Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` > - > > Key: SPARK-48005 > URL: https://issues.apache.org/jira/browse/SPARK-48005 > Project: Spark > Issue Type: Sub-task > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
Ruifeng Zheng created SPARK-48005: - Summary: Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` Key: SPARK-48005 URL: https://issues.apache.org/jira/browse/SPARK-48005 Project: Spark Issue Type: Sub-task Components: Connect, PS Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48003) Hll sketch aggregate support for strings with collation
[ https://issues.apache.org/jira/browse/SPARK-48003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48003: --- Labels: pull-request-available (was: ) > Hll sketch aggregate support for strings with collation > --- > > Key: SPARK-48003 > URL: https://issues.apache.org/jira/browse/SPARK-48003 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47993) Drop Python 3.8 support
[ https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47993. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46228 [https://github.com/apache/spark/pull/46228] > Drop Python 3.8 support > --- > > Key: SPARK-47993 > URL: https://issues.apache.org/jira/browse/SPARK-47993 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available, release-notes > Fix For: 4.0.0 > > > Python 3.8 is EOL in this October. Considering the release schedule, we > should better drop it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48004) Add WriteFilesExecBase trait for v1 write
[ https://issues.apache.org/jira/browse/SPARK-48004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48004: --- Labels: pull-request-available (was: ) > Add WriteFilesExecBase trait for v1 write > - > > Key: SPARK-48004 > URL: https://issues.apache.org/jira/browse/SPARK-48004 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48004) Add WriteFilesExecBase trait for v1 write
XiDuo You created SPARK-48004: - Summary: Add WriteFilesExecBase trait for v1 write Key: SPARK-48004 URL: https://issues.apache.org/jira/browse/SPARK-48004 Project: Spark Issue Type: Task Components: SQL Affects Versions: 4.0.0 Reporter: XiDuo You -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48003) Hll sketch aggregate support for strings with collation
Uroš Bojanić created SPARK-48003: Summary: Hll sketch aggregate support for strings with collation Key: SPARK-48003 URL: https://issues.apache.org/jira/browse/SPARK-48003 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
[ https://issues.apache.org/jira/browse/SPARK-48001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48001. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46238 [https://github.com/apache/spark/pull/46238] > Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` > - > > Key: SPARK-48001 > URL: https://issues.apache.org/jira/browse/SPARK-48001 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
[ https://issues.apache.org/jira/browse/SPARK-48001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48001: Assignee: Yang Jie > Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` > - > > Key: SPARK-48001 > URL: https://issues.apache.org/jira/browse/SPARK-48001 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47968) MsSQLServer: Map datatimeoffset to TimestampType
[ https://issues.apache.org/jira/browse/SPARK-47968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47968: --- Labels: pull-request-available (was: ) > MsSQLServer: Map datatimeoffset to TimestampType > > > Key: SPARK-47968 > URL: https://issues.apache.org/jira/browse/SPARK-47968 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
[ https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47986. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46221 [https://github.com/apache/spark/pull/46221] > [CONNECT][PYTHON] Unable to create a new session when the default session is > closed by the server > - > > Key: SPARK-47986 > URL: https://issues.apache.org/jira/browse/SPARK-47986 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.5.0, 3.5.1 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the server closes a session, usually after a cluster restart, the client > is unaware of this until it receives an error. > Once it does so, there is no way for the client to create a new session since > the stale sessions are still recorded as default and active sessions. > The only solution currently is to restart the Python interpreter on the > client, or to reach into the session builder and change the active or default > session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
[ https://issues.apache.org/jira/browse/SPARK-48001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48001: --- Labels: pull-request-available (was: ) > Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` > - > > Key: SPARK-48001 > URL: https://issues.apache.org/jira/browse/SPARK-48001 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners
[ https://issues.apache.org/jira/browse/SPARK-48002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48002: --- Labels: pull-request-available (was: ) > Add Observed metrics test in PySpark StreamingQueryListeners > > > Key: SPARK-48002 > URL: https://issues.apache.org/jira/browse/SPARK-48002 > Project: Spark > Issue Type: New Feature > Components: SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48002) Add Observed metrics test in PySpark StreamingQueryListeners
Wei Liu created SPARK-48002: --- Summary: Add Observed metrics test in PySpark StreamingQueryListeners Key: SPARK-48002 URL: https://issues.apache.org/jira/browse/SPARK-48002 Project: Spark Issue Type: New Feature Components: SS Affects Versions: 4.0.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48001) Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
Yang Jie created SPARK-48001: Summary: Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` Key: SPARK-48001 URL: https://issues.apache.org/jira/browse/SPARK-48001 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47922) Implement try_parse_json
[ https://issues.apache.org/jira/browse/SPARK-47922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47922. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46141 [https://github.com/apache/spark/pull/46141] > Implement try_parse_json > > > Key: SPARK-47922 > URL: https://issues.apache.org/jira/browse/SPARK-47922 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement try_parse_json expression that runs parse_json on valid string > inputs and returns null when the input string is malformed. Note that this > expression also only supports string input types. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47440) SQLServer does not support LIKE operator in binary comparison
[ https://issues.apache.org/jira/browse/SPARK-47440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47440: - Parent: SPARK-47361 Issue Type: Sub-task (was: Bug) > SQLServer does not support LIKE operator in binary comparison > - > > Key: SPARK-47440 > URL: https://issues.apache.org/jira/browse/SPARK-47440 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Bukorovic >Assignee: Stefan Bukorovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > When pushing Spark query to MsSqlServer engine we sometimes construct SQL > query that has a LIKE operator as a part of the binary comparison operation, > which is not permitted in SQL Server syntax. > For example a query > {code:java} > SELECT * FROM people WHERE (name LIKE "s%") = 1{code} > will not execute on MsSQLServer. > These queries should be detected and not pushed to execution in MsSqlServer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org