[ https://issues.apache.org/jira/browse/SPARK-40630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xsys updated SPARK-40630: ------------------------- Description: h3. Describe the bug When we construct a DataFrame with an invalid DATE/TIMESTAMP (e.g. {{{}1969-12-31 23:59:59 B{}}}) via {{{}spark-shell{}}}, or insert an invalid DATE/TIMESTAMP into a table via {{{}spark-sql{}}}, both interfaces unexpectedly evaluate the invalid value to {{{}NULL{}}}, instead of throwing an exception. h3. To Reproduce On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: {code:java} $SPARK_HOME/bin/spark-sql{code} Execute the following: {code:java} spark-sql> create table timestamp_vals(c1 TIMESTAMP) stored as ORC; spark-sql> insert into timestamp_vals select cast(" 1969-12-31 23:59:59 B "as timestamp); spark-sql> select * from timestamp_vals; NULL{code} Using {{{}spark-shell{}}}: {code:java} $SPARK_HOME/bin/spark-shell{code} Execute the following: {code:java} scala> val rdd = sc.parallelize(Seq(Row(Seq(" 1969-12-31 23:59:59 B ").toDF("time").select(to_timestamp(col("ti me")).as("to_timestamp")).first().getAs[java.sql.Timestamp](0)))) rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[721] at parallelize at <console>:28 scala> val schema = new StructType().add(StructField("c1", TimestampType, true)) schema: org.apache.spark.sql.types.StructType = StructType(StructField(c1,TimestampType,true)) scala> val df = spark.createDataFrame(rdd, schema) df194: org.apache.spark.sql.DataFrame = [c1: timestamp] scala> df.show(false) +----+ |c1 | +----+ |null| +----+ {code} h3. Expected behavior We expect both {{spark-sql}} & {{spark-shell}} interfaces to throw an exception for an invalid DATE/TIMESTAMP, like what they do for most of the other data types (e.g. invalid value {{"foo"}} for {{INT}} data type). was: h3. Describe the bug When we construct a DataFrame with an invalid DATE/TIMESTAMP (e.g. {{{}1969-12-31 23:59:59 B{}}}) via {{{}spark-shell{}}}, or insert an invalid DATE/TIMESTAMP into a table via {{{}spark-sql{}}}, both interfaces unexpectedly evaluate the invalid value to {{{}NULL{}}}, instead of throwing an exception. h3. To Reproduce On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: {code:java} $SPARK_HOME/bin/spark-sql{code} Execute the following: {code:java} spark-sql> create table timestamp_vals(c1 TIMESTAMP) stored as ORC; spark-sql> insert into timestamp_vals select cast(" 1969-12-31 23:59:59 B "as timestamp); spark-sql> select * from timestamp_vals; NULL{code} Using {{{}spark-shell{}}}: {code:java} $SPARK_HOME/bin/spark-shell{code} Execute the following: {code:java} scala> val rdd = sc.parallelize(Seq(Row(Seq(" 1969-12-31 23:59:59 B ").toDF("time").select(to_timestamp(col("ti me")).as("to_timestamp")).first().getAs[java.sql.Timestamp](0)))) rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[721] at parallelize at <console>:28 scala> val schema = new StructType().add(StructField("c1", TimestampType, true)) schema: org.apache.spark.sql.types.StructType = StructType(StructField(c1,TimestampType,true)) scala> val df = spark.createDataFrame(rdd, schema) df194: org.apache.spark.sql.DataFrame = [c1: timestamp] scala> df.show(false) +----+ |c1 | +----+ |null| +----+ {code} h3. Expected behavior We expect both {{spark-sql}} & {{spark-shell}} interfaces to throw an exception for an invalid DATE/TIMESTAMP, like what they do for most of the other data types (e.g. invalid value {{"foo"}} for {{INT}} data type). > Both SparkSQL and DataFrame insert invalid DATE/TIMESTAMP as NULL > ----------------------------------------------------------------- > > Key: SPARK-40630 > URL: https://issues.apache.org/jira/browse/SPARK-40630 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 3.2.1 > Reporter: xsys > Priority: Major > > h3. Describe the bug > When we construct a DataFrame with an invalid DATE/TIMESTAMP (e.g. > {{{}1969-12-31 23:59:59 B{}}}) via {{{}spark-shell{}}}, or insert an invalid > DATE/TIMESTAMP into a table via {{{}spark-sql{}}}, both interfaces > unexpectedly evaluate the invalid value to {{{}NULL{}}}, instead of throwing > an exception. > h3. To Reproduce > On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: > {code:java} > $SPARK_HOME/bin/spark-sql{code} > Execute the following: > {code:java} > spark-sql> create table timestamp_vals(c1 TIMESTAMP) stored as ORC; > spark-sql> insert into timestamp_vals select cast(" 1969-12-31 23:59:59 B "as > timestamp); > spark-sql> select * from timestamp_vals; > NULL{code} > > Using {{{}spark-shell{}}}: > {code:java} > $SPARK_HOME/bin/spark-shell{code} > > Execute the following: > {code:java} > scala> val rdd = sc.parallelize(Seq(Row(Seq(" 1969-12-31 23:59:59 B > ").toDF("time").select(to_timestamp(col("ti > me")).as("to_timestamp")).first().getAs[java.sql.Timestamp](0)))) > rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > ParallelCollectionRDD[721] at parallelize at <console>:28 > scala> val schema = new StructType().add(StructField("c1", TimestampType, > true)) > schema: org.apache.spark.sql.types.StructType = > StructType(StructField(c1,TimestampType,true)) > scala> val df = spark.createDataFrame(rdd, schema) > df194: org.apache.spark.sql.DataFrame = [c1: timestamp] > scala> df.show(false) > +----+ > |c1 | > +----+ > |null| > +----+ > {code} > h3. Expected behavior > We expect both {{spark-sql}} & {{spark-shell}} interfaces to throw an > exception for an invalid DATE/TIMESTAMP, like what they do for most of the > other data types (e.g. invalid value {{"foo"}} for {{INT}} data type). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org