[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46650243 Ok merging this in master & branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1143 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46650189 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15941/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46650188 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46646550 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46646557 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1143#discussion_r14007471 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._ /** Cast the child expression to the target data type. */ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { override def foldable = child.foldable - def nullable = (child.dataType, dataType) match { + + override def nullable = (child.dataType, dataType) match { case (StringType, _: NumericType) => true case (StringType, TimestampType) => true case _=> child.nullable } + override def toString = s"CAST($child, $dataType)" type EvaluatedType = Any - def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) { -null - } else { -func(a.asInstanceOf[T]) - } + // [[func]] assumes the input is no longer null because eval already does the null check. + @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T]) // UDFToString - def castToString: Any => Any = child.dataType match { -case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8")) -case _ => nullOrCast[Any](_, _.toString) + private[this] def castToString: Any => Any = child.dataType match { +case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8")) +case _ => buildCast[Any](_, _.toString) } // BinaryConverter - def castToBinary: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.getBytes("UTF-8")) + private[this] def castToBinary: Any => Any = child.dataType match { +case StringType => buildCast[String](_, _.getBytes("UTF-8")) } // UDFToBoolean - def castToBoolean: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.length() != 0) -case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 || b.getNanos() != 0)}) -case LongType => nullOrCast[Long](_, _ != 0) -case IntegerType => nullOrCast[Int](_, _ != 0) -case ShortType => nullOrCast[Short](_, _ != 0) -case ByteType => nullOrCast[Byte](_, _ != 0) -case DecimalType => nullOrCast[BigDecimal](_, _ != 0) -case DoubleType => nullOrCast[Double](_, _ != 0) -case FloatType => nullOrCast[Float](_, _ != 0) + private[this] def castToBoolean: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, _.length() != 0) +case TimestampType => + buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0) +case LongType => + buildCast[Long](_, _ != 0) +case IntegerType => + buildCast[Int](_, _ != 0) +case ShortType => + buildCast[Short](_, _ != 0) +case ByteType => + buildCast[Byte](_, _ != 0) +case DecimalType => + buildCast[BigDecimal](_, _ != 0) +case DoubleType => + buildCast[Double](_, _ != 0) +case FloatType => + buildCast[Float](_, _ != 0) } // TimestampConverter - def castToTimestamp: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => { - // Throw away extra if more than 9 decimal places - val periodIdx = s.indexOf("."); - var n = s - if (periodIdx != -1) { -if (n.length() - periodIdx > 9) { - n = n.substring(0, periodIdx + 10) + private[this] def castToTimestamp: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, s => { +// Throw away extra if more than 9 decimal places +val periodIdx = s.indexOf(".") +var n = s +if (periodIdx != -1) { + if (n.length() - periodIdx > 9) { +n = n.substring(0, periodIdx + 10) + } } - } - try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null} -}) -case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 1000)) -case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000)) -case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000)) -case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000)) -case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000)) +try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null } + }) +case BooleanType => + buildCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 1000))
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1143#discussion_r14007468 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -104,85 +121,118 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { } // Timestamp to long, converting milliseconds to seconds - private def timestampToLong(ts: Timestamp) = ts.getTime / 1000 + private[this] def timestampToLong(ts: Timestamp) = ts.getTime / 1000 - private def timestampToDouble(ts: Timestamp) = { + private[this] def timestampToDouble(ts: Timestamp) = { // First part is the seconds since the beginning of time, followed by nanosecs. ts.getTime / 1000 + ts.getNanos.toDouble / 10 } - def castToLong: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toLong catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1L else 0L) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t)) -case DecimalType => nullOrCast[BigDecimal](_, _.toLong) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toLong(b) - } - - def castToInt: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toInt catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t).toInt) -case DecimalType => nullOrCast[BigDecimal](_, _.toInt) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b) - } - - def castToShort: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toShort catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toShort else 0.toShort) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t).toShort) -case DecimalType => nullOrCast[BigDecimal](_, _.toShort) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toShort - } - - def castToByte: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toByte catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toByte else 0.toByte) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t).toByte) -case DecimalType => nullOrCast[BigDecimal](_, _.toByte) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte - } - - def castToDecimal: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try BigDecimal(s.toDouble) catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) BigDecimal(1) else BigDecimal(0)) + private[this] def castToLong: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, s => try s.toLong catch { +case _: NumberFormatException => null + }) --- End diff -- Try is really slow though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46646277 LGTM. Making everything other than `eval` to be `private[this]` makes sense, so that `eval` is guaranteed to be the only entrance of type casting and the null check won't be skipped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1143#discussion_r14007430 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -104,85 +121,118 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { } // Timestamp to long, converting milliseconds to seconds - private def timestampToLong(ts: Timestamp) = ts.getTime / 1000 + private[this] def timestampToLong(ts: Timestamp) = ts.getTime / 1000 - private def timestampToDouble(ts: Timestamp) = { + private[this] def timestampToDouble(ts: Timestamp) = { // First part is the seconds since the beginning of time, followed by nanosecs. ts.getTime / 1000 + ts.getNanos.toDouble / 10 } - def castToLong: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toLong catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1L else 0L) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t)) -case DecimalType => nullOrCast[BigDecimal](_, _.toLong) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toLong(b) - } - - def castToInt: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toInt catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t).toInt) -case DecimalType => nullOrCast[BigDecimal](_, _.toInt) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b) - } - - def castToShort: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toShort catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toShort else 0.toShort) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t).toShort) -case DecimalType => nullOrCast[BigDecimal](_, _.toShort) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toShort - } - - def castToByte: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try s.toByte catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toByte else 0.toByte) -case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t).toByte) -case DecimalType => nullOrCast[BigDecimal](_, _.toByte) -case x: NumericType => b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte - } - - def castToDecimal: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => try BigDecimal(s.toDouble) catch { - case _: NumberFormatException => null -}) -case BooleanType => nullOrCast[Boolean](_, b => if(b) BigDecimal(1) else BigDecimal(0)) + private[this] def castToLong: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, s => try s.toLong catch { +case _: NumberFormatException => null + }) --- End diff -- Maybe we can simplify this to: ```scala Try(s.toLong).getOrElse(null) ``` (`s.toLong` throws `NumerFormatException` only.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1143#discussion_r14007203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._ /** Cast the child expression to the target data type. */ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { override def foldable = child.foldable - def nullable = (child.dataType, dataType) match { + + override def nullable = (child.dataType, dataType) match { case (StringType, _: NumericType) => true case (StringType, TimestampType) => true case _=> child.nullable } + override def toString = s"CAST($child, $dataType)" type EvaluatedType = Any - def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) { -null - } else { -func(a.asInstanceOf[T]) - } + // [[func]] assumes the input is no longer null because eval already does the null check. + @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T]) // UDFToString - def castToString: Any => Any = child.dataType match { -case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8")) -case _ => nullOrCast[Any](_, _.toString) + private[this] def castToString: Any => Any = child.dataType match { +case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8")) +case _ => buildCast[Any](_, _.toString) } // BinaryConverter - def castToBinary: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.getBytes("UTF-8")) + private[this] def castToBinary: Any => Any = child.dataType match { +case StringType => buildCast[String](_, _.getBytes("UTF-8")) } // UDFToBoolean - def castToBoolean: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.length() != 0) -case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 || b.getNanos() != 0)}) -case LongType => nullOrCast[Long](_, _ != 0) -case IntegerType => nullOrCast[Int](_, _ != 0) -case ShortType => nullOrCast[Short](_, _ != 0) -case ByteType => nullOrCast[Byte](_, _ != 0) -case DecimalType => nullOrCast[BigDecimal](_, _ != 0) -case DoubleType => nullOrCast[Double](_, _ != 0) -case FloatType => nullOrCast[Float](_, _ != 0) + private[this] def castToBoolean: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, _.length() != 0) +case TimestampType => + buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0) +case LongType => + buildCast[Long](_, _ != 0) +case IntegerType => + buildCast[Int](_, _ != 0) +case ShortType => + buildCast[Short](_, _ != 0) +case ByteType => + buildCast[Byte](_, _ != 0) +case DecimalType => + buildCast[BigDecimal](_, _ != 0) +case DoubleType => + buildCast[Double](_, _ != 0) +case FloatType => + buildCast[Float](_, _ != 0) } // TimestampConverter - def castToTimestamp: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => { - // Throw away extra if more than 9 decimal places - val periodIdx = s.indexOf("."); - var n = s - if (periodIdx != -1) { -if (n.length() - periodIdx > 9) { - n = n.substring(0, periodIdx + 10) + private[this] def castToTimestamp: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, s => { +// Throw away extra if more than 9 decimal places +val periodIdx = s.indexOf(".") +var n = s +if (periodIdx != -1) { + if (n.length() - periodIdx > 9) { +n = n.substring(0, periodIdx + 10) + } } - } - try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null} -}) -case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 1000)) -case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000)) -case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000)) -case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000)) -case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000)) +try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null } + }) +case BooleanType => + buildCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 10
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1143#discussion_r14006923 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._ /** Cast the child expression to the target data type. */ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { override def foldable = child.foldable - def nullable = (child.dataType, dataType) match { + + override def nullable = (child.dataType, dataType) match { case (StringType, _: NumericType) => true case (StringType, TimestampType) => true case _=> child.nullable } + override def toString = s"CAST($child, $dataType)" type EvaluatedType = Any - def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) { -null - } else { -func(a.asInstanceOf[T]) - } + // [[func]] assumes the input is no longer null because eval already does the null check. + @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T]) // UDFToString - def castToString: Any => Any = child.dataType match { -case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8")) -case _ => nullOrCast[Any](_, _.toString) + private[this] def castToString: Any => Any = child.dataType match { +case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8")) +case _ => buildCast[Any](_, _.toString) } // BinaryConverter - def castToBinary: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.getBytes("UTF-8")) + private[this] def castToBinary: Any => Any = child.dataType match { +case StringType => buildCast[String](_, _.getBytes("UTF-8")) } // UDFToBoolean - def castToBoolean: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.length() != 0) -case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 || b.getNanos() != 0)}) -case LongType => nullOrCast[Long](_, _ != 0) -case IntegerType => nullOrCast[Int](_, _ != 0) -case ShortType => nullOrCast[Short](_, _ != 0) -case ByteType => nullOrCast[Byte](_, _ != 0) -case DecimalType => nullOrCast[BigDecimal](_, _ != 0) -case DoubleType => nullOrCast[Double](_, _ != 0) -case FloatType => nullOrCast[Float](_, _ != 0) + private[this] def castToBoolean: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, _.length() != 0) +case TimestampType => + buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0) +case LongType => + buildCast[Long](_, _ != 0) +case IntegerType => + buildCast[Int](_, _ != 0) +case ShortType => + buildCast[Short](_, _ != 0) +case ByteType => + buildCast[Byte](_, _ != 0) +case DecimalType => + buildCast[BigDecimal](_, _ != 0) +case DoubleType => + buildCast[Double](_, _ != 0) +case FloatType => + buildCast[Float](_, _ != 0) } // TimestampConverter - def castToTimestamp: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => { - // Throw away extra if more than 9 decimal places - val periodIdx = s.indexOf("."); - var n = s - if (periodIdx != -1) { -if (n.length() - periodIdx > 9) { - n = n.substring(0, periodIdx + 10) + private[this] def castToTimestamp: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, s => { +// Throw away extra if more than 9 decimal places +val periodIdx = s.indexOf(".") +var n = s +if (periodIdx != -1) { + if (n.length() - periodIdx > 9) { +n = n.substring(0, periodIdx + 10) + } } - } - try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null} -}) -case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 1000)) -case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000)) -case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000)) -case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000)) -case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000)) +try Timestamp.valueOf(n) catch { case _: java.lang.IllegalArgumentException => null } + }) +case BooleanType => + buildCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 10
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1143#discussion_r14006911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._ /** Cast the child expression to the target data type. */ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { override def foldable = child.foldable - def nullable = (child.dataType, dataType) match { + + override def nullable = (child.dataType, dataType) match { case (StringType, _: NumericType) => true case (StringType, TimestampType) => true case _=> child.nullable } + override def toString = s"CAST($child, $dataType)" type EvaluatedType = Any - def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) { -null - } else { -func(a.asInstanceOf[T]) - } + // [[func]] assumes the input is no longer null because eval already does the null check. + @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T]) // UDFToString - def castToString: Any => Any = child.dataType match { -case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8")) -case _ => nullOrCast[Any](_, _.toString) + private[this] def castToString: Any => Any = child.dataType match { +case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8")) +case _ => buildCast[Any](_, _.toString) } // BinaryConverter - def castToBinary: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.getBytes("UTF-8")) + private[this] def castToBinary: Any => Any = child.dataType match { +case StringType => buildCast[String](_, _.getBytes("UTF-8")) } // UDFToBoolean - def castToBoolean: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, _.length() != 0) -case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 || b.getNanos() != 0)}) -case LongType => nullOrCast[Long](_, _ != 0) -case IntegerType => nullOrCast[Int](_, _ != 0) -case ShortType => nullOrCast[Short](_, _ != 0) -case ByteType => nullOrCast[Byte](_, _ != 0) -case DecimalType => nullOrCast[BigDecimal](_, _ != 0) -case DoubleType => nullOrCast[Double](_, _ != 0) -case FloatType => nullOrCast[Float](_, _ != 0) + private[this] def castToBoolean: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, _.length() != 0) +case TimestampType => + buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0) +case LongType => + buildCast[Long](_, _ != 0) +case IntegerType => + buildCast[Int](_, _ != 0) +case ShortType => + buildCast[Short](_, _ != 0) +case ByteType => + buildCast[Byte](_, _ != 0) +case DecimalType => + buildCast[BigDecimal](_, _ != 0) +case DoubleType => + buildCast[Double](_, _ != 0) +case FloatType => + buildCast[Float](_, _ != 0) } // TimestampConverter - def castToTimestamp: Any => Any = child.dataType match { -case StringType => nullOrCast[String](_, s => { - // Throw away extra if more than 9 decimal places - val periodIdx = s.indexOf("."); - var n = s - if (periodIdx != -1) { -if (n.length() - periodIdx > 9) { - n = n.substring(0, periodIdx + 10) + private[this] def castToTimestamp: Any => Any = child.dataType match { +case StringType => + buildCast[String](_, s => { +// Throw away extra if more than 9 decimal places +val periodIdx = s.indexOf(".") +var n = s +if (periodIdx != -1) { + if (n.length() - periodIdx > 9) { --- End diff -- How about merging these two `if` statements into 1 with `&&`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46643095 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46643096 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15937/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46640182 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1143#issuecomment-46640185 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1143 [SPARK-2209][SQL] Cast shouldn't do null check twice. Also took the chance to clean up cast a little bit. Too many arrows on each line before! You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark cast Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1143.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1143 commit c2b88aee347edab3d36475ef75b30a1d2f15b1c1 Author: Reynold Xin Date: 2014-06-20T02:43:06Z [SPARK-2209][SQL] Cast shouldn't do null check twice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---