[jira] [Created] (SPARK-49252) Decouple TaskSetExcludelist creation from HealthTracker

2024-08-15 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-49252:
--

 Summary: Decouple TaskSetExcludelist creation from HealthTracker
 Key: SPARK-49252
 URL: https://issues.apache.org/jira/browse/SPARK-49252
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0, 3.5.3
Reporter: Tianhan Hu


TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a 
HealthTracker available.

This task tracks the effort to decouple task/stage level excludeList and 
application level healthTracker creation, so they can be enabled independently 
for finer control on granularity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49252) Decouple TaskSetExcludelist creation from HealthTracker

2024-08-15 Thread Tianhan Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianhan Hu updated SPARK-49252:
---
Description: 
TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a 
HealthTracker available.

This task tracks the effort to decouple task/stage level excludeList and 
application level healthTracker creation, so they can be enabled independently.

  was:
TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a 
HealthTracker available.

This task tracks the effort to decouple task/stage level excludeList and 
application level healthTracker creation, so they can be enabled independently 
for finer control on granularity.


> Decouple TaskSetExcludelist creation from HealthTracker
> ---
>
> Key: SPARK-49252
> URL: https://issues.apache.org/jira/browse/SPARK-49252
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.3
>Reporter: Tianhan Hu
>Priority: Major
>
> TaskSetManager creates TaskSetExcludelist based on if TaskScheduler has a 
> HealthTracker available.
> This task tracks the effort to decouple task/stage level excludeList and 
> application level healthTracker creation, so they can be enabled 
> independently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44919) Avro connector: convert a union of a single primitive type to a StructType

2023-08-22 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-44919:
--

 Summary: Avro connector: convert a union of a single primitive 
type to a StructType
 Key: SPARK-44919
 URL: https://issues.apache.org/jira/browse/SPARK-44919
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Tianhan Hu


Spark Avro data source schema converter currently converts union with a single 
primitive type to a Spark primitive type instead of a StructType.

While for more complex union types that consists of multiple primitive types, 
the schema converter translate them into StructTypes.

For example, 
import scala.collection.JavaConverters._
import org.apache.avro._
import org.apache.spark.sql.avro._

// ["string", "null"]
SchemaConverters.toSqlType(
  Schema.createUnion(Seq(Schema.create(Schema.Type.STRING), 
Schema.create(Schema.Type.NULL)).asJava)
).dataType

// ["string", "int", "null"]
SchemaConverters.toSqlType(
  Schema.createUnion(Seq(Schema.create(Schema.Type.STRING), 
Schema.create(Schema.Type.INT), Schema.create(Schema.Type.NULL)).asJava)
).dataType
The first one would return StringType, the second would return 
StructType(StringType, IntegerType).
 
We hope to add a new configuration to control the conversion behavior. The 
default behavior would still be the same. When the config is altered, a union 
with single primitive type would be translated into StructType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43362) Special handling of JSON type for MySQL connector

2023-05-03 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-43362:
--

 Summary: Special handling of JSON type for MySQL connector
 Key: SPARK-43362
 URL: https://issues.apache.org/jira/browse/SPARK-43362
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Tianhan Hu


MySQL JSON type is converted into JDBC {{VARCHAR}} type with precision of -1 on 
some MariaDB drivers.
When receiving {{VARCHAR}} with negative precision, Spark will throw an error.

This ticket purposes to special case this scenario by directly converting JSON 
type into StringType in MySQLDialect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43040) Improve TimestampNTZ support in JDBC data source

2023-04-05 Thread Tianhan Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianhan Hu updated SPARK-43040:
---
Description: 
[https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in JDBC 
data source and [https://github.com/apache/spark/pull/37013] applies a fix to 
pass more test cases with H2.

The problem is that Java Timestamp is a poorly defined class and different JDBC 
drivers implement "getTimestamp" and "setTimestamp" with different expected 
behaviors in mind. The general conversion implementation would work with some 
JDBC dialects and their drivers but not others. This issue is discovered when 
testing with PostgreSQL database.

We will need to have dialect specific conversions between JDBC timestamps and 
TimestampNTZ.

  was:
[https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in JDBC 
data source and [https://github.com/apache/spark/pull/37013] applies a fix to 
pass more test cases with H2.

The problem is that Java Timestamp is a poorly defined class and different JDBC 
drivers implement "getTimestamp" and "setTimestamp" with different expected 
behaviors in mind. The general conversion implementation would work with some 
JDBC dialects and their drivers but not others.

This issue is discovered when testing with PostgreSQL database.


> Improve TimestampNTZ support in JDBC data source
> 
>
> Key: SPARK-43040
> URL: https://issues.apache.org/jira/browse/SPARK-43040
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.0
>Reporter: Tianhan Hu
>Priority: Major
>
> [https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in 
> JDBC data source and [https://github.com/apache/spark/pull/37013] applies a 
> fix to pass more test cases with H2.
> The problem is that Java Timestamp is a poorly defined class and different 
> JDBC drivers implement "getTimestamp" and "setTimestamp" with different 
> expected behaviors in mind. The general conversion implementation would work 
> with some JDBC dialects and their drivers but not others. This issue is 
> discovered when testing with PostgreSQL database.
> We will need to have dialect specific conversions between JDBC timestamps and 
> TimestampNTZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43040) Improve TimestampNTZ support in JDBC data source

2023-04-05 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-43040:
--

 Summary: Improve TimestampNTZ support in JDBC data source
 Key: SPARK-43040
 URL: https://issues.apache.org/jira/browse/SPARK-43040
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.3, 3.4.0
Reporter: Tianhan Hu


[https://github.com/apache/spark/pull/36726] supports TimestampNTZ type in JDBC 
data source and [https://github.com/apache/spark/pull/37013] applies a fix to 
pass more test cases with H2.

The problem is that Java Timestamp is a poorly defined class and different JDBC 
drivers implement "getTimestamp" and "setTimestamp" with different expected 
behaviors in mind. The general conversion implementation would work with some 
JDBC dialects and their drivers but not others.

This issue is discovered when testing with PostgreSQL database.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38574) Enrich Avro data source documentation

2022-03-16 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-38574:
--

 Summary: Enrich Avro data source documentation
 Key: SPARK-38574
 URL: https://issues.apache.org/jira/browse/SPARK-38574
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: Tianhan Hu


Enrich Avro data source documentation to emphasize the difference between 
*avroSchema* which is an option, and *jsonFormatSchema* which is a parameter 
for function *from_avro* .

When using {*}from_avro{*}, *avroSchema* option can be set to a compatible and 
evolved schema, while *jsonFormatSchema* has to be the actual schema. Elsewise, 
the behavior is undefined.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37891) Add scalastyle check to disable scala.concurrent.ExecutionContext.Implicits.global

2022-01-12 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-37891:
--

 Summary: Add scalastyle check to disable 
scala.concurrent.ExecutionContext.Implicits.global
 Key: SPARK-37891
 URL: https://issues.apache.org/jira/browse/SPARK-37891
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Tianhan Hu
 Fix For: 3.2.2


Add scalastyle check to disable internal use of 
scala.concurrent.ExecutionContext.Implicits.global. 
The reason is that user queries can also use this thread pool, causing 
competing in resource and starvation. Spark-internal APIs should thus not use 
the global thread pool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36919) Make BadRecordException serializable

2021-10-03 Thread Tianhan Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianhan Hu updated SPARK-36919:
---
Description: 
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in 
the exception chaining behavior. In a case of parsing a malformed CSV, where 
the root cause exception should be {{Caused by: java.lang.RuntimeException: 
Malformed CSV record}}, only the top level exception is kept, and all lower 
level exceptions and root cause are lost. Thus, when we call 
{{ExceptionUtils.getRootCause}} on the exception, we still get itself.
The reason for the difference is that {{RuntimeException}} is wrapped in 
{{BadRecordException}}, which has unserializable fields. When we try to 
serialize the exception from tasks and deserialize from scheduler, the 
exception is lost.
This PR makes unserializable fields of {{BadRecordException}} transient, so the 
rest of the exception could be serialized and deserialized properly.

  was:
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in 
the exception chaining behavior. In a case of parsing a malformed CSV, where 
the root cause exception should be {{Caused by: 
org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV 
record}}, only the top level exception is kept, and all lower level exceptions 
and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on 
the exception, we still get itself.
The reason for the difference is that {{MalformedCSVException}} is now wrapped 
in {{BadRecordException}}, which has unserializable fields. When we try to 
serialize the exception from tasks and deserialize from scheduler, the 
exception is lost.
This PR makes unserializable fields of {{BadRecordException}} transient, so the 
rest of the exception could be serialized and deserialized properly.


> Make BadRecordException serializable
> 
>
> Key: SPARK-36919
> URL: https://issues.apache.org/jira/browse/SPARK-36919
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.3.0, 3.2.1
>Reporter: Tianhan Hu
>Priority: Minor
>
> Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in 
> the exception chaining behavior. In a case of parsing a malformed CSV, where 
> the root cause exception should be {{Caused by: java.lang.RuntimeException: 
> Malformed CSV record}}, only the top level exception is kept, and all lower 
> level exceptions and root cause are lost. Thus, when we call 
> {{ExceptionUtils.getRootCause}} on the exception, we still get itself.
> The reason for the difference is that {{RuntimeException}} is wrapped in 
> {{BadRecordException}}, which has unserializable fields. When we try to 
> serialize the exception from tasks and deserialize from scheduler, the 
> exception is lost.
> This PR makes unserializable fields of {{BadRecordException}} transient, so 
> the rest of the exception could be serialized and deserialized properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36919) Make BadRecordException serializable

2021-10-03 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-36919:
--

 Summary: Make BadRecordException serializable
 Key: SPARK-36919
 URL: https://issues.apache.org/jira/browse/SPARK-36919
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0, 3.3.0, 3.2.1
Reporter: Tianhan Hu


Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in 
the exception chaining behavior. In a case of parsing a malformed CSV, where 
the root cause exception should be {{Caused by: 
org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV 
record}}, only the top level exception is kept, and all lower level exceptions 
and root cause are lost. Thus, when we call {{ExceptionUtils.getRootCause}} on 
the exception, we still get itself.
The reason for the difference is that {{MalformedCSVException}} is now wrapped 
in {{BadRecordException}}, which has unserializable fields. When we try to 
serialize the exception from tasks and deserialize from scheduler, the 
exception is lost.
This PR makes unserializable fields of {{BadRecordException}} transient, so the 
rest of the exception could be serialized and deserialized properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32631) Handle Null error message in hive ThriftServer UI

2020-08-16 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-32631:
--

 Summary: Handle Null error message in hive ThriftServer UI
 Key: SPARK-32631
 URL: https://issues.apache.org/jira/browse/SPARK-32631
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Tianhan Hu


This fix is trying to prevent NullPointerException caused by Null error message 
by wrapping the message with Option and using getOrElse to handle Null value.

The possible reason why the error message would be null is that java Throwable 
allows the detailMessage to be null. However, in the render code we assume that 
the error message would be be null and try to do indexOf() on the string. 
Therefore, if some exception doesn't set the error message, it would trigger 
NullPointerException in the hive ThriftServer UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32627) Add showSessionLink parameter to SqlStatsPagedTable class in ThriftServerPage

2020-08-15 Thread Tianhan Hu (Jira)
Tianhan Hu created SPARK-32627:
--

 Summary: Add showSessionLink parameter to SqlStatsPagedTable class 
in ThriftServerPage
 Key: SPARK-32627
 URL: https://issues.apache.org/jira/browse/SPARK-32627
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Tianhan Hu


Introduced showSessionLink argument to SqlStatsPagedTable class in 
ThriftServerPage. When this argument is set to true, "Session ID" tooltip will 
be shown to the user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org