[jira] [Assigned] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
[ https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48053: Assignee: Hyukjin Kwon > SparkSession.createDataFrame should warn for unsupported options > > > Key: SPARK-48053 > URL: https://issues.apache.org/jira/browse/SPARK-48053 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > spark.createDataFrame([1,2,3], verifySchema=True) > {code} > and > {code} > spark.createDataFrame([1,2,3], samplingRatio=0.5) > {code} > Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
[ https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48053. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46295 [https://github.com/apache/spark/pull/46295] > SparkSession.createDataFrame should warn for unsupported options > > > Key: SPARK-48053 > URL: https://issues.apache.org/jira/browse/SPARK-48053 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > spark.createDataFrame([1,2,3], verifySchema=True) > {code} > and > {code} > spark.createDataFrame([1,2,3], samplingRatio=0.5) > {code} > Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47585) SQL core: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47585. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46264 [https://github.com/apache/spark/pull/46264] > SQL core: Migrate logInfo with variables to structured logging framework > > > Key: SPARK-47585 > URL: https://issues.apache.org/jira/browse/SPARK-47585 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-48016: --- Fix Version/s: 4.0.0 3.5.2 > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2, 3.4.4 > > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48016. Fix Version/s: 3.4.4 Resolution: Fixed Issue resolved by pull request 46289 [https://github.com/apache/spark/pull/46289] > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4 > > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48055) Enable PandasUDFScalarParityTests.{test_vectorized_udf_empty_partition, test_vectorized_udf_struct_with_empty_partition}
[ https://issues.apache.org/jira/browse/SPARK-48055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48055: --- Labels: pull-request-available (was: ) > Enable PandasUDFScalarParityTests.{test_vectorized_udf_empty_partition, > test_vectorized_udf_struct_with_empty_partition} > > > Key: SPARK-48055 > URL: https://issues.apache.org/jira/browse/SPARK-48055 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48054) Backward compatibility test for Spark Connect
Hyukjin Kwon created SPARK-48054: Summary: Backward compatibility test for Spark Connect Key: SPARK-48054 URL: https://issues.apache.org/jira/browse/SPARK-48054 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Now that we can run the Spark Connect server separately in CI, we can run the Spark Connect server of lower version, and higher version of client, and the opposite as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
[ https://issues.apache.org/jira/browse/SPARK-48053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48053: --- Labels: pull-request-available (was: ) > SparkSession.createDataFrame should warn for unsupported options > > > Key: SPARK-48053 > URL: https://issues.apache.org/jira/browse/SPARK-48053 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > spark.createDataFrame([1,2,3], verifySchema=True) > {code} > and > {code} > spark.createDataFrame([1,2,3], samplingRatio=0.5) > {code} > Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48030) InternalRowComparableWrapper should cache rowOrdering to improve performace
[ https://issues.apache.org/jira/browse/SPARK-48030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-48030. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46265 [https://github.com/apache/spark/pull/46265] > InternalRowComparableWrapper should cache rowOrdering to improve performace > --- > > Key: SPARK-48030 > URL: https://issues.apache.org/jira/browse/SPARK-48030 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1, 3.4.3 >Reporter: YE >Assignee: YE >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: screenshot-1.png > > > InternalRowComparableWrapper recreates row ordering for each output partition > when SPJ is enabled. The row ordering is generated via codegen which is quite > expensive and the output partitions might be quite large for production table > such as hundreds of thousands partitions. We encountered this issue when > applying SPJ with multiple large Iceberg tables and the plan phase took tens > of minutes to complete. > Attaching a screenshot to provide related stack trace: > !screenshot-1.png! > A simple fix for this would be caching the rowOrdering for > InternalRowComparableWrapper as the datatype of the InternalRow is immutable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48030) InternalRowComparableWrapper should cache rowOrdering to improve performace
[ https://issues.apache.org/jira/browse/SPARK-48030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-48030: Assignee: YE > InternalRowComparableWrapper should cache rowOrdering to improve performace > --- > > Key: SPARK-48030 > URL: https://issues.apache.org/jira/browse/SPARK-48030 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1, 3.4.3 >Reporter: YE >Assignee: YE >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png > > > InternalRowComparableWrapper recreates row ordering for each output partition > when SPJ is enabled. The row ordering is generated via codegen which is quite > expensive and the output partitions might be quite large for production table > such as hundreds of thousands partitions. We encountered this issue when > applying SPJ with multiple large Iceberg tables and the plan phase took tens > of minutes to complete. > Attaching a screenshot to provide related stack trace: > !screenshot-1.png! > A simple fix for this would be caching the rowOrdering for > InternalRowComparableWrapper as the datatype of the InternalRow is immutable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48053) SparkSession.createDataFrame should warn for unsupported options
Hyukjin Kwon created SPARK-48053: Summary: SparkSession.createDataFrame should warn for unsupported options Key: SPARK-48053 URL: https://issues.apache.org/jira/browse/SPARK-48053 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} spark.createDataFrame([1,2,3], verifySchema=True) {code} and {code} spark.createDataFrame([1,2,3], samplingRatio=0.5) {code} Do not work with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48052) Recover pyspark-connect CI by parent classes
Hyukjin Kwon created SPARK-48052: Summary: Recover pyspark-connect CI by parent classes Key: SPARK-48052 URL: https://issues.apache.org/jira/browse/SPARK-48052 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48051) Add Golden Table Tests for Variant from different engines
[ https://issues.apache.org/jira/browse/SPARK-48051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48051: --- Labels: pull-request-available (was: ) > Add Golden Table Tests for Variant from different engines > - > > Key: SPARK-48051 > URL: https://issues.apache.org/jira/browse/SPARK-48051 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Priority: Major > Labels: pull-request-available > > Add tests to ensure that OSS Spark can read variant tables written by > different engines -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48051) Add Golden Table Tests for Variant from different engines
Richard Chen created SPARK-48051: Summary: Add Golden Table Tests for Variant from different engines Key: SPARK-48051 URL: https://issues.apache.org/jira/browse/SPARK-48051 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Richard Chen Add tests to ensure that OSS Spark can read variant tables written by different engines -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47129) Make ResolveRelations cache connect plan properly
[ https://issues.apache.org/jira/browse/SPARK-47129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-47129: -- Issue Type: Bug (was: Improvement) > Make ResolveRelations cache connect plan properly > - > > Key: SPARK-47129 > URL: https://issues.apache.org/jira/browse/SPARK-47129 > Project: Spark > Issue Type: Bug > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47129) Make ResolveRelations cache connect plan properly
[ https://issues.apache.org/jira/browse/SPARK-47129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-47129: -- Affects Version/s: 3.4.3 3.5.1 > Make ResolveRelations cache connect plan properly > - > > Key: SPARK-47129 > URL: https://issues.apache.org/jira/browse/SPARK-47129 > Project: Spark > Issue Type: Bug > Components: Connect, SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48050) Log logical plan at query start
Fanyue Xia created SPARK-48050: -- Summary: Log logical plan at query start Key: SPARK-48050 URL: https://issues.apache.org/jira/browse/SPARK-48050 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.5.1, 3.5.0 Reporter: Fanyue Xia We should log the logical plan of queries at query start. Have the logical plan in the logs will help us determine whether logical plans have changed between query runs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`
[ https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48033. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46269 [https://github.com/apache/spark/pull/46269] > Support Generated Column expressions that are `RuntimeReplaceable` > -- > > Key: SPARK-48033 > URL: https://issues.apache.org/jira/browse/SPARK-48033 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, default columns that have a default of a `RuntimeReplaceable` > expression fails. > This is because the `AlterTableCommand` constant folds before replacing > expressions with the actual implementation. For example: > ``` > sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET") > sql("INSERT INTO t VALUES(DEFAULT)") > ``` > fails because `parse_json` is `RuntimeReplaceable` and is evaluated before > the analyzer inserts the correct expression into the plan > This is especially important for Variant types because literal variants are > difficult to create - `parse_json` will likely be used the majority of the > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48033) Support Generated Column expressions that are `RuntimeReplaceable`
[ https://issues.apache.org/jira/browse/SPARK-48033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48033: --- Assignee: Richard Chen > Support Generated Column expressions that are `RuntimeReplaceable` > -- > > Key: SPARK-48033 > URL: https://issues.apache.org/jira/browse/SPARK-48033 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Richard Chen >Assignee: Richard Chen >Priority: Major > Labels: pull-request-available > > Currently, default columns that have a default of a `RuntimeReplaceable` > expression fails. > This is because the `AlterTableCommand` constant folds before replacing > expressions with the actual implementation. For example: > ``` > sql(s"CREATE TABLE t(v VARIANT DEFAULT parse_json('1')) USING PARQUET") > sql("INSERT INTO t VALUES(DEFAULT)") > ``` > fails because `parse_json` is `RuntimeReplaceable` and is evaluated before > the analyzer inserts the correct expression into the plan > This is especially important for Variant types because literal variants are > difficult to create - `parse_json` will likely be used the majority of the > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842200#comment-17842200 ] Dongjoon Hyun commented on SPARK-48016: --- Thank you so much! > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47741: --- Assignee: Milan Stefanovic > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47741. - Resolution: Fixed Issue resolved by pull request 45896 [https://github.com/apache/spark/pull/45896] > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842193#comment-17842193 ] Gengliang Wang commented on SPARK-48016: [~dongjoon] sure, I just moved it. > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-48016: --- Parent Issue: SPARK-44111 (was: SPARK-35161) > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842191#comment-17842191 ] Dongjoon Hyun commented on SPARK-48016: --- Hi, [~Gengliang.Wang] . - I updated the JIRA title according to the commit title. - The umbrella Jira issue is done at Apache Spark 3.4.0. To give a more visibility, shall we move this to SPARK-44111 because recent ANSI JIRA issues are there? > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48016) Fix a bug in try_divide function when with decimals
[ https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48016: -- Summary: Fix a bug in try_divide function when with decimals (was: Binary Arithmetic operators should include the evalMode when makeCopy) > Fix a bug in try_divide function when with decimals > --- > > Key: SPARK-48016 > URL: https://issues.apache.org/jira/browse/SPARK-48016 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Binary Arithmetic operators should include the evalMode during makeCopy. > Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of > returning null > > {code:java} > SELECT try_divide(1, decimal(0)); {code} > This is caused from the rule DecimalPrecision: > {code:java} > case b @ BinaryOperator(left, right) if left.dataType != right.dataType => > (left, right) match { > ... > case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] && > l.dataType.isInstanceOf[IntegralType] && > literalPickMinimumPrecision => > b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 11:11 PM: -- Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala (X) deploy/worker/Worker.scala (X) deploy/worker/WorkerWatcher.scala (X) executor/CoarseGrainedExecutorBackend.scala (X) executor/Executor.scala (X) executor/ProcfsMetricsGetter.scala (X) internal/io/HadoopMapReduceCommitProtocol.scala (X) internal/io/SparkHadoopWriter.scala (X) internal/plugin/PluginContextImpl.scala (X) internal/plugin/PluginEndpoint.scala (X) memory/ExecutionMemoryPool.scala (X) memory/StorageMemoryPool.scala (X) metrics/ExecutorMetricType.scala (X) metrics/MetricsSystem.scala (X) metrics/sink/StatsdReporter.scala (X) network/netty/NettyBlockRpcServer.scala (X) rdd/HadoopRDD.scala (X) rdd/JdbcRDD.scala (X) rdd/NewHadoopRDD.scala (X) rdd/PairRDDFunctions.scala (X) rdd/RDD.scala (X) rdd/RDDOperationScope.scala (X) rdd/ReliableCheckpointRDD.scala (X) resource/ResourceProfile.scala (X) resource/ResourceProfileManager.scala (X) resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 10:21 PM: -- Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala (X) deploy/worker/Worker.scala (X) deploy/worker/WorkerWatcher.scala (X) executor/CoarseGrainedExecutorBackend.scala (X) executor/Executor.scala (X) executor/ProcfsMetricsGetter.scala (X) internal/io/HadoopMapReduceCommitProtocol.scala (X) internal/io/SparkHadoopWriter.scala (X) internal/plugin/PluginContextImpl.scala (X) internal/plugin/PluginEndpoint.scala (X) memory/ExecutionMemoryPool.scala (X) memory/StorageMemoryPool.scala (X) metrics/ExecutorMetricType.scala (X) metrics/MetricsSystem.scala (X) metrics/sink/StatsdReporter.scala (X) network/netty/NettyBlockRpcServer.scala (X) rdd/HadoopRDD.scala (X) rdd/JdbcRDD.scala (X) rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 10:19 PM: -- Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala (X) deploy/worker/Worker.scala (X) deploy/worker/WorkerWatcher.scala (X) executor/CoarseGrainedExecutorBackend.scala (X) executor/Executor.scala (X) executor/ProcfsMetricsGetter.scala (X) internal/io/HadoopMapReduceCommitProtocol.scala (X) internal/io/SparkHadoopWriter.scala (X) internal/plugin/PluginContextImpl.scala (X) internal/plugin/PluginEndpoint.scala (X) memory/ExecutionMemoryPool.scala (X) memory/StorageMemoryPool.scala (X) metrics/ExecutorMetricType.scala (X) metrics/MetricsSystem.scala (X) metrics/sink/StatsdReporter.scala (X) network/netty/NettyBlockRpcServer.scala (X) rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 10:01 PM: -- Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala (X) deploy/worker/Worker.scala (X) deploy/worker/WorkerWatcher.scala (X) executor/CoarseGrainedExecutorBackend.scala (X) executor/Executor.scala (X) executor/ProcfsMetricsGetter.scala (X) internal/io/HadoopMapReduceCommitProtocol.scala (X) internal/io/SparkHadoopWriter.scala (X) internal/plugin/PluginContextImpl.scala (X) internal/plugin/PluginEndpoint.scala (X) memory/ExecutionMemoryPool.scala (X) memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 9:58 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala (X) deploy/worker/Worker.scala (X) deploy/worker/WorkerWatcher.scala (X) executor/CoarseGrainedExecutorBackend.scala (X) executor/Executor.scala (X) executor/ProcfsMetricsGetter.scala (X) internal/io/HadoopMapReduceCommitProtocol.scala (X) internal/io/SparkHadoopWriter.scala (X) internal/plugin/PluginContextImpl.scala (X) internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 9:44 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala (X) deploy/worker/Worker.scala (X) deploy/worker/WorkerWatcher.scala (X) executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 9:38 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala (X) deploy/master/ZooKeeperPersistenceEngine.scala (X) deploy/rest/RestSubmissionClient.scala (X) deploy/security/HBaseDelegationTokenProvider.scala (X) deploy/security/HadoopDelegationTokenManager.scala (X) deploy/security/HadoopFSDelegationTokenProvider.scala (X) deploy/worker/DriverRunner.scala (X) deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 9:26 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala (X) deploy/master/FileSystemPersistenceEngine.scala (X) deploy/master/Master.scala (X) deploy/master/MasterArguments.scala deploy/master/ZooKeeperPersistenceEngine.scala deploy/rest/RestSubmissionClient.scala deploy/security/HBaseDelegationTokenProvider.scala deploy/security/HadoopDelegationTokenManager.scala deploy/security/HadoopFSDelegationTokenProvider.scala deploy/worker/DriverRunner.scala deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X)
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 9:15 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala (X) deploy/client/StandaloneAppClient.scala (X) deploy/history/ApplicationCache.scala (X) deploy/history/EventLogFileCompactor.scala (X) deploy/history/EventLogFileWriters.scala (X) deploy/history/FsHistoryProvider.scala (X) deploy/history/HistoryServerDiskManager.scala deploy/master/FileSystemPersistenceEngine.scala deploy/master/Master.scala deploy/master/MasterArguments.scala deploy/master/ZooKeeperPersistenceEngine.scala deploy/rest/RestSubmissionClient.scala deploy/security/HBaseDelegationTokenProvider.scala deploy/security/HadoopDelegationTokenManager.scala deploy/security/HadoopFSDelegationTokenProvider.scala deploy/worker/DriverRunner.scala deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 8:54 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala (X) deploy/Client.scala (X) deploy/DriverTimeoutPlugin.scala (X) deploy/ExternalShuffleService.scala (X) deploy/FaultToleranceTest.scala (X) deploy/RPackageUtils.scala (X) deploy/SparkSubmit.scala (X) deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala deploy/history/EventLogFileWriters.scala deploy/history/FsHistoryProvider.scala deploy/history/HistoryServerDiskManager.scala deploy/master/FileSystemPersistenceEngine.scala deploy/master/Master.scala deploy/master/MasterArguments.scala deploy/master/ZooKeeperPersistenceEngine.scala deploy/rest/RestSubmissionClient.scala deploy/security/HBaseDelegationTokenProvider.scala deploy/security/HadoopDelegationTokenManager.scala deploy/security/HadoopFSDelegationTokenProvider.scala deploy/worker/DriverRunner.scala deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala deploy/Client.scala deploy/DriverTimeoutPlugin.scala deploy/ExternalShuffleService.scala deploy/FaultToleranceTest.scala deploy/RPackageUtils.scala deploy/SparkSubmit.scala deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 8:23 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala (X) SparkContext.scala (X) SparkEnv.scala (X) api/python/PythonRunner.scala (X) api/python/PythonWorkerFactory.scala (X) api/python/SerDeUtil.scala (X) api/r/RBackendHandler.scala deploy/Client.scala deploy/DriverTimeoutPlugin.scala deploy/ExternalShuffleService.scala deploy/FaultToleranceTest.scala deploy/RPackageUtils.scala deploy/SparkSubmit.scala deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala deploy/history/EventLogFileWriters.scala deploy/history/FsHistoryProvider.scala deploy/history/HistoryServerDiskManager.scala deploy/master/FileSystemPersistenceEngine.scala deploy/master/Master.scala deploy/master/MasterArguments.scala deploy/master/ZooKeeperPersistenceEngine.scala deploy/rest/RestSubmissionClient.scala deploy/security/HBaseDelegationTokenProvider.scala deploy/security/HadoopDelegationTokenManager.scala deploy/security/HadoopFSDelegationTokenProvider.scala deploy/worker/DriverRunner.scala deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala SparkContext.scala SparkEnv.scala api/python/PythonRunner.scala api/python/PythonWorkerFactory.scala api/python/SerDeUtil.scala api/r/RBackendHandler.scala deploy/Client.scala deploy/DriverTimeoutPlugin.scala deploy/ExternalShuffleService.scala deploy/FaultToleranceTest.scala deploy/RPackageUtils.scala deploy/SparkSubmit.scala deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala deploy/history/EventLogFileWriters.scala
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 8:07 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: (X) BarrierTaskContext.scala (X) Dependency.scala (X) ExecutorAllocationManager.scala (X) HeartbeatReceiver.scala (X) MapOutputTracker.scala (X) SecurityManager.scala (X) SparkConf.scala SparkContext.scala SparkEnv.scala api/python/PythonRunner.scala api/python/PythonWorkerFactory.scala api/python/SerDeUtil.scala api/r/RBackendHandler.scala deploy/Client.scala deploy/DriverTimeoutPlugin.scala deploy/ExternalShuffleService.scala deploy/FaultToleranceTest.scala deploy/RPackageUtils.scala deploy/SparkSubmit.scala deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala deploy/history/EventLogFileWriters.scala deploy/history/FsHistoryProvider.scala deploy/history/HistoryServerDiskManager.scala deploy/master/FileSystemPersistenceEngine.scala deploy/master/Master.scala deploy/master/MasterArguments.scala deploy/master/ZooKeeperPersistenceEngine.scala deploy/rest/RestSubmissionClient.scala deploy/security/HBaseDelegationTokenProvider.scala deploy/security/HadoopDelegationTokenManager.scala deploy/security/HadoopFSDelegationTokenProvider.scala deploy/worker/DriverRunner.scala deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: BarrierTaskContext.scala Dependency.scala ExecutorAllocationManager.scala HeartbeatReceiver.scala MapOutputTracker.scala SecurityManager.scala SparkConf.scala SparkContext.scala SparkEnv.scala api/python/PythonRunner.scala api/python/PythonWorkerFactory.scala api/python/SerDeUtil.scala api/r/RBackendHandler.scala deploy/Client.scala deploy/DriverTimeoutPlugin.scala deploy/ExternalShuffleService.scala deploy/FaultToleranceTest.scala deploy/RPackageUtils.scala deploy/SparkSubmit.scala deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala deploy/history/EventLogFileWriters.scala deploy/history/FsHistoryProvider.scala
[jira] [Resolved] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value
[ https://issues.apache.org/jira/browse/SPARK-48042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48042. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46282 [https://github.com/apache/spark/pull/46282] > Don't use a copy of timestamp formatter with a new override zone for each > value > --- > > Key: SPARK-48042 > URL: https://issues.apache.org/jira/browse/SPARK-48042 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel edited comment on SPARK-47578 at 4/29/24 4:50 PM: - Edit: I updated this command to search within the Spark Core directory. spark$ grep logWarning core/src/main/* -R | cut -d ':' -f 1 | sort -u Under core/src/main/scala/org/apache/spark/: BarrierTaskContext.scala Dependency.scala ExecutorAllocationManager.scala HeartbeatReceiver.scala MapOutputTracker.scala SecurityManager.scala SparkConf.scala SparkContext.scala SparkEnv.scala api/python/PythonRunner.scala api/python/PythonWorkerFactory.scala api/python/SerDeUtil.scala api/r/RBackendHandler.scala deploy/Client.scala deploy/DriverTimeoutPlugin.scala deploy/ExternalShuffleService.scala deploy/FaultToleranceTest.scala deploy/RPackageUtils.scala deploy/SparkSubmit.scala deploy/SparkSubmitArguments.scala deploy/client/StandaloneAppClient.scala deploy/history/ApplicationCache.scala deploy/history/EventLogFileCompactor.scala deploy/history/EventLogFileWriters.scala deploy/history/FsHistoryProvider.scala deploy/history/HistoryServerDiskManager.scala deploy/master/FileSystemPersistenceEngine.scala deploy/master/Master.scala deploy/master/MasterArguments.scala deploy/master/ZooKeeperPersistenceEngine.scala deploy/rest/RestSubmissionClient.scala deploy/security/HBaseDelegationTokenProvider.scala deploy/security/HadoopDelegationTokenManager.scala deploy/security/HadoopFSDelegationTokenProvider.scala deploy/worker/DriverRunner.scala deploy/worker/ExecutorRunner.scala deploy/worker/Worker.scala deploy/worker/WorkerWatcher.scala executor/CoarseGrainedExecutorBackend.scala executor/Executor.scala executor/ProcfsMetricsGetter.scala internal/io/HadoopMapReduceCommitProtocol.scala internal/io/SparkHadoopWriter.scala internal/plugin/PluginContextImpl.scala internal/plugin/PluginEndpoint.scala memory/ExecutionMemoryPool.scala memory/StorageMemoryPool.scala metrics/ExecutorMetricType.scala metrics/MetricsSystem.scala metrics/sink/StatsdReporter.scala network/netty/NettyBlockRpcServer.scala rdd/HadoopRDD.scala rdd/JdbcRDD.scala rdd/NewHadoopRDD.scala rdd/PairRDDFunctions.scala rdd/RDD.scala rdd/RDDOperationScope.scala rdd/ReliableCheckpointRDD.scala resource/ResourceProfile.scala resource/ResourceProfileManager.scala resource/ResourceUtils.scala rpc/netty/Dispatcher.scala rpc/netty/Inbox.scala rpc/netty/NettyRpcEnv.scala rpc/netty/Outbox.scala scheduler/AsyncEventQueue.scala scheduler/DAGScheduler.scala scheduler/HealthTracker.scala scheduler/JobWaiter.scala scheduler/ReplayListenerBus.scala scheduler/SchedulableBuilder.scala scheduler/TaskSchedulerImpl.scala scheduler/TaskSetManager.scala scheduler/cluster/CoarseGrainedSchedulerBackend.scala scheduler/cluster/StandaloneSchedulerBackend.scala security/CryptoStreamUtils.scala serializer/SerializationDebugger.scala shuffle/IndexShuffleBlockResolver.scala shuffle/ShuffleBlockPusher.scala shuffle/sort/SortShuffleManager.scala status/KVUtils.scala storage/BlockManager.scala storage/BlockManagerDecommissioner.scala storage/BlockManagerMaster.scala storage/BlockManagerMasterEndpoint.scala storage/BlockReplicationPolicy.scala storage/DiskBlockManager.scala storage/DiskBlockObjectWriter.scala storage/DiskStore.scala storage/FallbackStorage.scala storage/PushBasedFetchHelper.scala storage/ShuffleBlockFetcherIterator.scala storage/TopologyMapper.scala storage/memory/MemoryStore.scala ui/JettyUtils.scala ui/scope/RDDOperationGraph.scala util/AccumulatorV2.scala util/DependencyUtils.scala util/HadoopFSUtils.scala util/PeriodicCheckpointer.scala util/SignalUtils.scala util/SizeEstimator.scala util/Utils.scala util/collection/ExternalAppendOnlyMap.scala util/logging/DriverLogger.scala util/logging/FileAppender.scala util/logging/RollingFileAppender.scala util/logging/RollingPolicy.scala util/random/StratifiedSamplingUtils.scala was (Author: JIRAUSER285772): spark-2$ grep logWarning sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u Under sql/core/src/main/scala/org/apache/spark/sql/ (excluding streaming): Column.scala SparkSession.scala api/python/PythonSQLUtils.scala api/r/SQLUtils.scala catalyst/analysis/ResolveSessionCatalog.scala execution/CacheManager.scala execution/ExistingRDD.scala execution/OptimizeMetadataOnlyQuery.scala execution/QueryExecution.scala execution/SparkSqlParser.scala execution/SparkStrategies.scala execution/WholeStageCodegenExec.scala execution/adaptive/AdaptiveSparkPlanExec.scala execution/adaptive/InsertAdaptiveSparkPlan.scala execution/adaptive/ShufflePartitionsUtil.scala execution/aggregate/HashAggregateExec.scala execution/command/AnalyzeTablesCommand.scala execution/command/CommandUtils.scala execution/command/SetCommand.scala execution/command/createDataSourceTables.scala execution/command/ddl.scala execution/datasources/BasicWriteStatsTracker.scala
[jira] [Resolved] (SPARK-48044) Cache `DataFrame.isStreaming`
[ https://issues.apache.org/jira/browse/SPARK-48044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48044. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46281 [https://github.com/apache/spark/pull/46281] > Cache `DataFrame.isStreaming` > - > > Key: SPARK-48044 > URL: https://issues.apache.org/jira/browse/SPARK-48044 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47585) SQL core: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842094#comment-17842094 ] Daniel commented on SPARK-47585: [~panbingkun] here are the affected files: spark$ grep logWarning sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u Under sql/core/src/main/scala/org/apache/spark/sql/ (excluding streaming): Column.scala SparkSession.scala api/python/PythonSQLUtils.scala api/r/SQLUtils.scala catalyst/analysis/ResolveSessionCatalog.scala execution/CacheManager.scala execution/ExistingRDD.scala execution/OptimizeMetadataOnlyQuery.scala execution/QueryExecution.scala execution/SparkSqlParser.scala execution/SparkStrategies.scala execution/WholeStageCodegenExec.scala execution/adaptive/AdaptiveSparkPlanExec.scala execution/adaptive/InsertAdaptiveSparkPlan.scala execution/adaptive/ShufflePartitionsUtil.scala execution/aggregate/HashAggregateExec.scala execution/command/AnalyzeTablesCommand.scala execution/command/CommandUtils.scala execution/command/SetCommand.scala execution/command/createDataSourceTables.scala execution/command/ddl.scala execution/datasources/BasicWriteStatsTracker.scala execution/datasources/DataSource.scala execution/datasources/DataSourceManager.scala execution/datasources/FilePartition.scala execution/datasources/FileScanRDD.scala execution/datasources/FileStatusCache.scala execution/datasources/csv/CSVDataSource.scala execution/datasources/jdbc/JDBCRDD.scala execution/datasources/jdbc/JDBCRelation.scala execution/datasources/jdbc/JdbcUtils.scala execution/datasources/json/JsonOutputWriter.scala execution/datasources/orc/OrcUtils.scala execution/datasources/parquet/ParquetFileFormat.scala execution/datasources/parquet/ParquetUtils.scala execution/datasources/v2/CacheTableExec.scala execution/datasources/v2/CreateIndexExec.scala execution/datasources/v2/CreateNamespaceExec.scala execution/datasources/v2/CreateTableExec.scala execution/datasources/v2/DataSourceV2Strategy.scala execution/datasources/v2/DropIndexExec.scala execution/datasources/v2/FilePartitionReader.scala execution/datasources/v2/FileScan.scala execution/datasources/v2/V2ScanPartitioningAndOrdering.scala execution/datasources/v2/jdbc/JDBCTableCatalog.scala execution/datasources/v2/state/StatePartitionReader.scala execution/datasources/xml/XmlDataSource.scala execution/python/ApplyInPandasWithStatePythonRunner.scala execution/python/AttachDistributedSequenceExec.scala execution/ui/SQLAppStatusListener.scala execution/window/WindowExecBase.scala internal/SharedState.scala jdbc/DB2Dialect.scala jdbc/H2Dialect.scala jdbc/JdbcDialects.scala jdbc/MsSqlServerDialect.scala jdbc/MySQLDialect.scala jdbc/OracleDialect.scala > SQL core: Migrate logInfo with variables to structured logging framework > > > Key: SPARK-47585 > URL: https://issues.apache.org/jira/browse/SPARK-47585 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842092#comment-17842092 ] Daniel commented on SPARK-47578: Sounds good, I start on this one now > Spark core: Migrate logWarn with variables to structured logging framework > -- > > Key: SPARK-47578 > URL: https://issues.apache.org/jira/browse/SPARK-47578 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
[ https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48046. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46284 [https://github.com/apache/spark/pull/46284] > Remove `clock` parameter from `DriverServiceFeatureStep` > > > Key: SPARK-48046 > URL: https://issues.apache.org/jira/browse/SPARK-48046 > Project: Spark > Issue Type: Task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
[ https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48046: - Assignee: Dongjoon Hyun > Remove `clock` parameter from `DriverServiceFeatureStep` > > > Key: SPARK-48046 > URL: https://issues.apache.org/jira/browse/SPARK-48046 > Project: Spark > Issue Type: Task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48043) Kryo serialization issue with push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-48043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romain Ardiet updated SPARK-48043: -- Description: I'm running a spark job on AWS EMR. I wanted to test the new push-based shuffle introduced in Spark 3.2 but it's failing with a kryo exception when I'm enabling it. The issue seems happening when Executor starts, on KryoSerializerInstance.getAutoReset() check: {code:java} 24/04/24 15:36:22 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to Failed to register classes with Kryo org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:186) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:241) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:174) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:105) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) ~[kryo-shaded-4.0.2.jar:?] at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:112) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:352) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.getAutoReset(KryoSerializer.scala:452) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:259) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:255) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$lzycompute$1(Utils.scala:2721) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$1(Utils.scala:2716) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2730) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.Executor.(Executor.scala:143) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:190) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_402] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402] Caused by: java.lang.ClassNotFoundException: com.analytics.AnalyticsEventWrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_402] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_402] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_402] at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_402] at org.apache.spark.util.Utils$.classForName(Utils.scala:228) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:177) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?] at
[jira] [Updated] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
[ https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48046: --- Labels: pull-request-available (was: ) > Remove `clock` parameter from `DriverServiceFeatureStep` > > > Key: SPARK-48046 > URL: https://issues.apache.org/jira/browse/SPARK-48046 > Project: Spark > Issue Type: Task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`
Dongjoon Hyun created SPARK-48046: - Summary: Remove `clock` parameter from `DriverServiceFeatureStep` Key: SPARK-48046 URL: https://issues.apache.org/jira/browse/SPARK-48046 Project: Spark Issue Type: Task Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf
[ https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48038: - Assignee: Cheng Pan > Promote driverServiceName to KubernetesDriverConf > - > > Key: SPARK-48038 > URL: https://issues.apache.org/jira/browse/SPARK-48038 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf
[ https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48038. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46276 [https://github.com/apache/spark/pull/46276] > Promote driverServiceName to KubernetesDriverConf > - > > Key: SPARK-48038 > URL: https://issues.apache.org/jira/browse/SPARK-48038 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul George updated SPARK-48045: Summary: Pandas API groupby with multi-agg-relabel ignores as_index=False (was: Pandas groupby with multi-agg-relabel ignores as_index=False) > Pandas API groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Priority: Minor > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include the group keys in the resulting DataFrame which diverges > from the expected behavior (as well as the behavior of native Pandas), e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
[ https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47148. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45234 [https://github.com/apache/spark/pull/45234] > Avoid to materialize AQE ExchangeQueryStageExec on the cancellation > --- > > Key: SPARK-47148 > URL: https://issues.apache.org/jira/browse/SPARK-47148 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL >Affects Versions: 4.0.0 >Reporter: Eren Avsarogullari >Assignee: Eren Avsarogullari >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the > cancellation. This causes unnecessary stage materialization by submitting > Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is > already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or > *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should > just be skipped without materializing it. > Please find sample use-case: > *1- Stage Materialization Steps:* > When stage materialization is failed: > {code:java} > 1.1- ShuffleQueryStage1 - is materialized successfully, > 1.2- ShuffleQueryStage2 - materialization is failed, > 1.3- ShuffleQueryStage3 - Not materialized yet so > ShuffleQueryStage3.shuffleFuture is not initialized yet{code} > *2- Stage Cancellation Steps:* > {code:java} > 2.1- ShuffleQueryStage1 - is canceled due to already materialized, > 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as > default by AQE because it could not be materialized, > 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet > but currently, it is also tried to cancel and this stage requires to be > materialized first.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48045) Pandas groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul George updated SPARK-48045: Description: A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key columns to {{*order*}} and {{*columns*}} before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] was: A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key index columns to {{*order*}} and {{*columns*}} before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > Pandas groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Priority: Minor > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include the group keys in the resulting DataFrame which diverges > from the expected behavior (as well as the behavior of native Pandas), e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value
[ https://issues.apache.org/jira/browse/SPARK-48042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48042: --- Labels: pull-request-available (was: ) > Don't use a copy of timestamp formatter with a new override zone for each > value > --- > > Key: SPARK-48042 > URL: https://issues.apache.org/jira/browse/SPARK-48042 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48045) Pandas groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul George updated SPARK-48045: Description: A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key index columns to {{*order*}} and {{*columns*}} before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] was: A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key index columns to \{order} and \{columns} before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > Pandas groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Priority: Minor > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include the group keys in the resulting DataFrame which diverges > from the expected behavior (as well as the behavior of native Pandas), e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key index columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48045) Pandas groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul George updated SPARK-48045: Description: A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key index columns to \{order} and \{columns} before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] was: A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key index columns to `order` and `columns` before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > Pandas groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Priority: Minor > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include the group keys in the resulting DataFrame which diverges > from the expected behavior (as well as the behavior of native Pandas), e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key index columns to \{order} and > \{columns} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48045) Pandas groupby with multi-agg-relabel ignores as_index=False
Paul George created SPARK-48045: --- Summary: Pandas groupby with multi-agg-relabel ignores as_index=False Key: SPARK-48045 URL: https://issues.apache.org/jira/browse/SPARK-48045 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.5.1 Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 Reporter: Paul George A Pandas API DataFrame groupby with as_index=False and a multilevel relabeling, such as {code:java} from pyspark import pandas as ps ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", as_index=False).agg(b_max=("b", "max")){code} fails to include the group keys in the resulting DataFrame which diverges from the expected behavior (as well as the behavior of native Pandas), e.g. *actual* {code:java} b_max 0 1 {code} *expected* {code:java} a b_max 0 0 1 {code} A possible fix is to prepend groupby key index columns to `order` and `columns` before filtering here: [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47994) SQLServer does not support 1 and 0 as boolean values
[ https://issues.apache.org/jira/browse/SPARK-47994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47994. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46231 [https://github.com/apache/spark/pull/46231] > SQLServer does not support 1 and 0 as boolean values > > > Key: SPARK-47994 > URL: https://issues.apache.org/jira/browse/SPARK-47994 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Stefan Bukorovic >Assignee: Stefan Bukorovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Sometimes in Spark, when a column that is generated as CASE WHEN structure is > used in comparison filter, output of optimized plan will be: CASE WHEN > expression THEN (1 or 0)... which is not supported in SQLServer. Exception is > thrown by SQLServer that a "non-boolean expression is given when boolean was > expected". For now, we should not support CASE WHEN pushdown in SQLServer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48041) Avoid call conf.resolver repeated in TableOutputResolver
[ https://issues.apache.org/jira/browse/SPARK-48041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xy updated SPARK-48041: --- Description: Avoid call conf.resolver repeated in TableOutputResolver (was: Avoid repeated calls to conf.resolver in Analyzer) > Avoid call conf.resolver repeated in TableOutputResolver > > > Key: SPARK-48041 > URL: https://issues.apache.org/jira/browse/SPARK-48041 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: xy >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Avoid call conf.resolver repeated in TableOutputResolver -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48039) Update the error class for `group.apply`
[ https://issues.apache.org/jira/browse/SPARK-48039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48039. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46277 [https://github.com/apache/spark/pull/46277] > Update the error class for `group.apply` > > > Key: SPARK-48039 > URL: https://issues.apache.org/jira/browse/SPARK-48039 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48043) Kryo serialization issue with push-based shuffle
Romain Ardiet created SPARK-48043: - Summary: Kryo serialization issue with push-based shuffle Key: SPARK-48043 URL: https://issues.apache.org/jira/browse/SPARK-48043 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.4.1 Environment: AWS EMR 6.14 (Spark 3.4.1) Reporter: Romain Ardiet I'm running a spark job on AWS EMR. I wanted to test the new push-based shuffle introduced in Spark 3.2 but it's failing with a kryo exception when I'm enabling it. The issue seems happening when Executor starts, on KryoSerializerInstance.getAutoReset() check: {code:java} 24/04/24 15:36:22 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to Failed to register classes with Kryo org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:186) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:241) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:174) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:105) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) ~[kryo-shaded-4.0.2.jar:?] at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:112) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:352) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.getAutoReset(KryoSerializer.scala:452) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:259) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:255) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$lzycompute$1(Utils.scala:2721) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$1(Utils.scala:2716) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2730) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.Executor.(Executor.scala:143) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:190) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_402] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402] Caused by: java.lang.ClassNotFoundException: com.analytics.AnalyticsEventWrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_402] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_402] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_402] at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_402] at org.apache.spark.util.Utils$.classForName(Utils.scala:228) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at
[jira] [Updated] (SPARK-48043) Kryo serialization issue with push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-48043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romain Ardiet updated SPARK-48043: -- Description: I'm running a spark job on AWS EMR. I wanted to test the new push-based shuffle introduced in Spark 3.2 but it's failing with a kryo exception when I'm enabling it. The issue seems happening when Executor starts, on KryoSerializerInstance.getAutoReset() check: {code:java} 24/04/24 15:36:22 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to Failed to register classes with Kryo org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:186) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:241) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:174) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:105) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) ~[kryo-shaded-4.0.2.jar:?] at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:112) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:352) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.getAutoReset(KryoSerializer.scala:452) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:259) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:255) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$lzycompute$1(Utils.scala:2721) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$1(Utils.scala:2716) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2730) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.Executor.(Executor.scala:143) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:190) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_402] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402] Caused by: java.lang.ClassNotFoundException: com.analytics.AnalyticsEventWrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_402] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_402] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_402] at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_402] at org.apache.spark.util.Utils$.classForName(Utils.scala:228) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:177) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?] at
[jira] [Updated] (SPARK-48041) Avoid call conf.resolver repeated in TableOutputResolver
[ https://issues.apache.org/jira/browse/SPARK-48041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xy updated SPARK-48041: --- Summary: Avoid call conf.resolver repeated in TableOutputResolver (was: Avoid repeated calls to conf.resolver in Analyzer) > Avoid call conf.resolver repeated in TableOutputResolver > > > Key: SPARK-48041 > URL: https://issues.apache.org/jira/browse/SPARK-48041 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: xy >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Avoid repeated calls to conf.resolver in Analyzer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value
[ https://issues.apache.org/jira/browse/SPARK-48042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-48042: - Issue Type: Improvement (was: Test) > Don't use a copy of timestamp formatter with a new override zone for each > value > --- > > Key: SPARK-48042 > URL: https://issues.apache.org/jira/browse/SPARK-48042 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value
Kent Yao created SPARK-48042: Summary: Don't use a copy of timestamp formatter with a new override zone for each value Key: SPARK-48042 URL: https://issues.apache.org/jira/browse/SPARK-48042 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48040) Spark connect supports set scheduler pool name
[ https://issues.apache.org/jira/browse/SPARK-48040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48040: --- Labels: pull-request-available (was: ) > Spark connect supports set scheduler pool name > -- > > Key: SPARK-48040 > URL: https://issues.apache.org/jira/browse/SPARK-48040 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.1 >Reporter: xie shuiahu >Priority: Major > Labels: pull-request-available > > Spark supports fair scheduler and grouping job into pools, but spark connect > don't support this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48041) Avoid repeated calls to conf.resolver in Analyzer
[ https://issues.apache.org/jira/browse/SPARK-48041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48041: --- Labels: pull-request-available (was: ) > Avoid repeated calls to conf.resolver in Analyzer > - > > Key: SPARK-48041 > URL: https://issues.apache.org/jira/browse/SPARK-48041 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: xy >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Avoid repeated calls to conf.resolver in Analyzer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48041) Avoid repeated calls to conf.resolver in Analyzer
xy created SPARK-48041: -- Summary: Avoid repeated calls to conf.resolver in Analyzer Key: SPARK-48041 URL: https://issues.apache.org/jira/browse/SPARK-48041 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.3 Reporter: xy Fix For: 4.0.0 Avoid repeated calls to conf.resolver in Analyzer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47567. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45791 [https://github.com/apache/spark/pull/45791] > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47567: --- Assignee: Milan Dankovic > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47567: -- Assignee: (was: Apache Spark) > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47567: -- Assignee: Apache Spark > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48040) Spark connect supports set scheduler pool name
xie shuiahu created SPARK-48040: --- Summary: Spark connect supports set scheduler pool name Key: SPARK-48040 URL: https://issues.apache.org/jira/browse/SPARK-48040 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.1 Reporter: xie shuiahu Spark supports fair scheduler and grouping job into pools, but spark connect don't support this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47567: -- Assignee: (was: Apache Spark) > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47567) StringLocate
[ https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47567: -- Assignee: Apache Spark > StringLocate > > > Key: SPARK-47567 > URL: https://issues.apache.org/jira/browse/SPARK-47567 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringLocate* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLocate* functions so > that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
[ https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47939. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46209 [https://github.com/apache/spark/pull/46209] > Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER > error > > > Key: SPARK-47939 > URL: https://issues.apache.org/jira/browse/SPARK-47939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); > *Fails:* spark.sql("describe select ?", Array(1)).show(); > *Fails:* spark.sql("explain select ?", Array(1)).show(); > Failures are of the form: > org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` > and provide a mapping of the parameter to either a SQL literal or collection > constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: > 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- > OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
[ https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47939: --- Assignee: Vladimir Golubev > Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER > error > > > Key: SPARK-47939 > URL: https://issues.apache.org/jira/browse/SPARK-47939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); > *Fails:* spark.sql("describe select ?", Array(1)).show(); > *Fails:* spark.sql("explain select ?", Array(1)).show(); > Failures are of the form: > org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` > and provide a mapping of the parameter to either a SQL literal or collection > constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: > 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- > OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf
[ https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Pan updated SPARK-48038: -- Summary: Promote driverServiceName to KubernetesDriverConf (was: Promote driverServiceName to DriverServiceFeatureStep) > Promote driverServiceName to KubernetesDriverConf > - > > Key: SPARK-48038 > URL: https://issues.apache.org/jira/browse/SPARK-48038 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48038) Promote driverServiceName to DriverServiceFeatureStep
[ https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48038: --- Labels: pull-request-available (was: ) > Promote driverServiceName to DriverServiceFeatureStep > - > > Key: SPARK-48038 > URL: https://issues.apache.org/jira/browse/SPARK-48038 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48038) Promote driverServiceName to DriverServiceFeatureStep
Cheng Pan created SPARK-48038: - Summary: Promote driverServiceName to DriverServiceFeatureStep Key: SPARK-48038 URL: https://issues.apache.org/jira/browse/SPARK-48038 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated SPARK-48037: --- Affects Version/s: 3.3.0 (was: 3.1.0) (was: 3.0.1) > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: dzcxzl >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org