[jira] [Created] (SPARK-49252) Decouple TaskSetExcludelist creation from HealthTracker
Tianhan Hu created SPARK-49252: -- Summary: Decouple TaskSetExcludelist creation from HealthTracker Key: SPARK-49252 URL: https://issues.apache.org/jira/browse/SPARK-49252 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0, 3.5.3 Reporter: Tianhan Hu TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a HealthTracker available. This task tracks the effort to decouple task/stage level excludeList and application level healthTracker creation, so they can be enabled independently for finer control on granularity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49252) Decouple TaskSetExcludelist creation from HealthTracker
[ https://issues.apache.org/jira/browse/SPARK-49252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianhan Hu updated SPARK-49252: --- Description: TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a HealthTracker available. This task tracks the effort to decouple task/stage level excludeList and application level healthTracker creation, so they can be enabled independently. was: TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a HealthTracker available. This task tracks the effort to decouple task/stage level excludeList and application level healthTracker creation, so they can be enabled independently for finer control on granularity. > Decouple TaskSetExcludelist creation from HealthTracker > --- > > Key: SPARK-49252 > URL: https://issues.apache.org/jira/browse/SPARK-49252 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0, 3.5.3 >Reporter: Tianhan Hu >Priority: Major > > TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a > HealthTracker available. > This task tracks the effort to decouple task/stage level excludeList and > application level healthTracker creation, so they can be enabled > independently. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44919) Avro connector: convert a union of a single primitive type to a StructType
Tianhan Hu created SPARK-44919: -- Summary: Avro connector: convert a union of a single primitive type to a StructType Key: SPARK-44919 URL: https://issues.apache.org/jira/browse/SPARK-44919 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.1 Reporter: Tianhan Hu Spark Avro data source schema converter currently converts union with a single primitive type to a Spark primitive type instead of a StructType. While for more complex union types that consists of multiple primitive types, the schema converter translate them into StructTypes. For example, import scala.collection.JavaConverters._ import org.apache.avro._ import org.apache.spark.sql.avro._ // ["string", "null"] SchemaConverters.toSqlType( Schema.createUnion(Seq(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)).asJava) ).dataType // ["string", "int", "null"] SchemaConverters.toSqlType( Schema.createUnion(Seq(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.INT), Schema.create(Schema.Type.NULL)).asJava) ).dataType The first one would return StringType, the second would return StructType(StringType, IntegerType). We hope to add a new configuration to control the conversion behavior. The default behavior would still be the same. When the config is altered, a union with single primitive type would be translated into StructType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43362) Special handling of JSON type for MySQL connector
Tianhan Hu created SPARK-43362: -- Summary: Special handling of JSON type for MySQL connector Key: SPARK-43362 URL: https://issues.apache.org/jira/browse/SPARK-43362 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Tianhan Hu MySQL JSON type is converted into JDBC {{VARCHAR}} type with precision of -1 on some MariaDB drivers. When receiving {{VARCHAR}} with negative precision, Spark will throw an error. This ticket purposes to special case this scenario by directly converting JSON type into StringType in MySQLDialect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43040) Improve TimestampNTZ support in JDBC data source
[ https://issues.apache.org/jira/browse/SPARK-43040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianhan Hu updated SPARK-43040: --- Description: [https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in JDBC data source and [https://github.com/apache/spark/pull/37013] applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. We will need to have dialect specific conversions between JDBC timestamps and TimestampNTZ. was: [https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in JDBC data source and [https://github.com/apache/spark/pull/37013] applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. > Improve TimestampNTZ support in JDBC data source > > > Key: SPARK-43040 > URL: https://issues.apache.org/jira/browse/SPARK-43040 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.3, 3.4.0 >Reporter: Tianhan Hu >Priority: Major > > [https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in > JDBC data source and [https://github.com/apache/spark/pull/37013] applies a > fix to pass more test cases with H2. > The problem is that Java Timestamp is a poorly defined class and different > JDBC drivers implement "getTimestamp" and "setTimestamp" with different > expected behaviors in mind. The general conversion implementation would work > with some JDBC dialects and their drivers but not others. This issue is > discovered when testing with PostgreSQL database. > We will need to have dialect specific conversions between JDBC timestamps and > TimestampNTZ. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43040) Improve TimestampNTZ support in JDBC data source
Tianhan Hu created SPARK-43040: -- Summary: Improve TimestampNTZ support in JDBC data source Key: SPARK-43040 URL: https://issues.apache.org/jira/browse/SPARK-43040 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.3, 3.4.0 Reporter: Tianhan Hu [https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in JDBC data source and [https://github.com/apache/spark/pull/37013] applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38574) Enrich Avro data source documentation
Tianhan Hu created SPARK-38574: -- Summary: Enrich Avro data source documentation Key: SPARK-38574 URL: https://issues.apache.org/jira/browse/SPARK-38574 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.1 Reporter: Tianhan Hu Enrich Avro data source documentation to emphasize the difference between *avroSchema* which is an option, and *jsonFormatSchema* which is a parameter for function *from_avro* . When using {*}from_avro{*}, *avroSchema* option can be set to a compatible and evolved schema, while *jsonFormatSchema* has to be the actual schema. Elsewise, the behavior is undefined. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37891) Add scalastyle check to disable scala.concurrent.ExecutionContext.Implicits.global
Tianhan Hu created SPARK-37891: -- Summary: Add scalastyle check to disable scala.concurrent.ExecutionContext.Implicits.global Key: SPARK-37891 URL: https://issues.apache.org/jira/browse/SPARK-37891 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: Tianhan Hu Fix For: 3.2.2 Add scalastyle check to disable internal use of scala.concurrent.ExecutionContext.Implicits.global. The reason is that user queries can also use this thread pool, causing competing in resource and starvation. Spark-internal APIs should thus not use the global thread pool. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36919) Make BadRecordException serializable
[ https://issues.apache.org/jira/browse/SPARK-36919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianhan Hu updated SPARK-36919: --- Description: Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be {{Caused by: java.lang.RuntimeException: Malformed CSV record}}, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on the exception, we still get itself. The reason for the difference is that {{RuntimeException}} is wrapped in {{BadRecordException}}, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost. This PR makes unserializable fields of {{BadRecordException}} transient, so the rest of the exception could be serialized and deserialized properly. was: Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be {{Caused by: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record}}, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on the exception, we still get itself. The reason for the difference is that {{MalformedCSVException}} is now wrapped in {{BadRecordException}}, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost. This PR makes unserializable fields of {{BadRecordException}} transient, so the rest of the exception could be serialized and deserialized properly. > Make BadRecordException serializable > > > Key: SPARK-36919 > URL: https://issues.apache.org/jira/browse/SPARK-36919 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0, 3.3.0, 3.2.1 >Reporter: Tianhan Hu >Priority: Minor > > Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in > the exception chaining behavior. In a case of parsing a malformed CSV, where > the root cause exception should be {{Caused by: java.lang.RuntimeException: > Malformed CSV record}}, only the top level exception is kept, and all lower > level exceptions and root cause are lost. Thus, when we call > {{ExceptionUtils.getRootCause}} on the exception, we still get itself. > The reason for the difference is that {{RuntimeException}} is wrapped in > {{BadRecordException}}, which has unserializable fields. When we try to > serialize the exception from tasks and deserialize from scheduler, the > exception is lost. > This PR makes unserializable fields of {{BadRecordException}} transient, so > the rest of the exception could be serialized and deserialized properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36919) Make BadRecordException serializable
Tianhan Hu created SPARK-36919: -- Summary: Make BadRecordException serializable Key: SPARK-36919 URL: https://issues.apache.org/jira/browse/SPARK-36919 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0, 3.3.0, 3.2.1 Reporter: Tianhan Hu Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be {{Caused by: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record}}, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on the exception, we still get itself. The reason for the difference is that {{MalformedCSVException}} is now wrapped in {{BadRecordException}}, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost. This PR makes unserializable fields of {{BadRecordException}} transient, so the rest of the exception could be serialized and deserialized properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32631) Handle Null error message in hive ThriftServer UI
Tianhan Hu created SPARK-32631: -- Summary: Handle Null error message in hive ThriftServer UI Key: SPARK-32631 URL: https://issues.apache.org/jira/browse/SPARK-32631 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Tianhan Hu This fix is trying to prevent NullPointerException caused by Null error message by wrapping the message with Option and using getOrElse to handle Null value. The possible reason why the error message would be null is that java Throwable allows the detailMessage to be null. However, in the render code we assume that the error message would be be null and try to do indexOf() on the string. Therefore, if some exception doesn't set the error message, it would trigger NullPointerException in the hive ThriftServer UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32627) Add showSessionLink parameter to SqlStatsPagedTable class in ThriftServerPage
Tianhan Hu created SPARK-32627: -- Summary: Add showSessionLink parameter to SqlStatsPagedTable class in ThriftServerPage Key: SPARK-32627 URL: https://issues.apache.org/jira/browse/SPARK-32627 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Tianhan Hu Introduced showSessionLink argument to SqlStatsPagedTable class in ThriftServerPage. When this argument is set to true, "Session ID" tooltip will be shown to the user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org