Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15072#discussion_r82730295 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -53,7 +53,15 @@ import org.apache.spark.util.Utils private[sql] object Dataset { def apply[T: Encoder](sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[T] = { - new Dataset(sparkSession, logicalPlan, implicitly[Encoder[T]]) + val encoder = implicitly[Encoder[T]] + if (encoder.clsTag.runtimeClass == classOf[Row]) { + // We should use the encoder generated from the executed plan rather than the existing + // encoder for DataFrame because the types of columns can be varied due to widening types. + // See SPARK-17123. This is a bit hacky. Maybe we should find a better way to do this. + ofRows(sparkSession, logicalPlan).asInstanceOf[Dataset[T]] + } else { + new Dataset(sparkSession, logicalPlan, encoder) + } --- End diff -- Ah, here is the codes I ran ```scala val dates = Seq( (new Date(0), BigDecimal.valueOf(1), new Timestamp(2)), (new Date(3), BigDecimal.valueOf(4), new Timestamp(5)) ).toDF("date", "timestamp", "decimal") val widenTypedRows = Seq( (new Timestamp(2), 10.5D, "string") ).toDF("date", "timestamp", "decimal") dates.except(widenTypedRows).collect() ``` and error message. ```java 23:10:05.331 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 30, Column 107: No applicable constructor/method found for actual parameters "long"; candidates are: "public static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)" /* 001 */ public java.lang.Object generate(Object[] references) { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ } /* 004 */ /* 005 */ class SpecificSafeProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { /* 006 */ /* 007 */ private Object[] references; /* 008 */ private InternalRow mutableRow; /* 009 */ private Object[] values; /* 010 */ private org.apache.spark.sql.types.StructType schema; /* 011 */ /* 012 */ public SpecificSafeProjection(Object[] references) { /* 013 */ this.references = references; /* 014 */ mutableRow = (InternalRow) references[references.length - 1]; /* 015 */ /* 016 */ this.schema = (org.apache.spark.sql.types.StructType) references[0]; /* 017 */ /* 018 */ } /* 019 */ /* 020 */ /* 021 */ /* 022 */ public java.lang.Object apply(java.lang.Object _i) { /* 023 */ InternalRow i = (InternalRow) _i; /* 024 */ /* 025 */ values = new Object[3]; /* 026 */ /* 027 */ boolean isNull2 = i.isNullAt(0); /* 028 */ long value2 = isNull2 ? -1L : (i.getLong(0)); /* 029 */ boolean isNull1 = isNull2; /* 030 */ final java.sql.Date value1 = isNull1 ? null : org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2); /* 031 */ isNull1 = value1 == null; /* 032 */ if (isNull1) { /* 033 */ values[0] = null; /* 034 */ } else { /* 035 */ values[0] = value1; /* 036 */ } /* 037 */ /* 038 */ boolean isNull4 = i.isNullAt(1); /* 039 */ double value4 = isNull4 ? -1.0 : (i.getDouble(1)); /* 040 */ /* 041 */ boolean isNull3 = isNull4; /* 042 */ java.math.BigDecimal value3 = null; /* 043 */ if (!isNull3) { /* 044 */ /* 045 */ Object funcResult = null; /* 046 */ funcResult = value4.toJavaBigDecimal(); /* 047 */ if (funcResult == null) { /* 048 */ isNull3 = true; /* 049 */ } else { /* 050 */ value3 = (java.math.BigDecimal) funcResult; /* 051 */ } /* 052 */ /* 053 */ } /* 054 */ isNull3 = value3 == null; /* 055 */ if (isNull3) { /* 056 */ values[1] = null; /* 057 */ } else { /* 058 */ values[1] = value3; /* 059 */ } /* 060 */ /* 061 */ boolean isNull6 = i.isNullAt(2); /* 062 */ UTF8String value6 = isNull6 ? null : (i.getUTF8String(2)); /* 063 */ boolean isNull5 = isNull6; /* 064 */ final java.sql.Timestamp value5 = isNull5 ? null : org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaTimestamp(value6); /* 065 */ isNull5 = value5 == null; /* 066 */ if (isNull5) { /* 067 */ values[2] = null; /* 068 */ } else { /* 069 */ values[2] = value5; /* 070 */ } /* 071 */ /* 072 */ final org.apache.spark.sql.Row value = new org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema(values, schema); /* 073 */ if (false) { /* 074 */ mutableRow.setNullAt(0); /* 075 */ } else { /* 076 */ /* 077 */ mutableRow.update(0, value); /* 078 */ } /* 079 */ /* 080 */ return mutableRow; /* 081 */ } /* 082 */ } ``` ``` /* 028 */ long value2 = isNull2 ? -1L : (i.getLong(0)); /* 029 */ boolean isNull1 = isNull2; /* 030 */ final java.sql.Date value1 = isNull1 ? null : org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2); ``` Here, it seems `toJavaDate` takes `Int` but it seems `long` is given from `Timestamp`. It (apparently) seems it needs widen schema to compare each. I will look into this deeper but do you have any idea on this maybe?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org