[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46650243
  
Ok merging this in master & branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1143


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46650189
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15941/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46650188
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46646550
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46646557
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14007471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._
 /** Cast the child expression to the target data type. */
 case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression {
   override def foldable = child.foldable
-  def nullable = (child.dataType, dataType) match {
+
+  override def nullable = (child.dataType, dataType) match {
 case (StringType, _: NumericType) => true
 case (StringType, TimestampType)  => true
 case _=> child.nullable
   }
+
   override def toString = s"CAST($child, $dataType)"
 
   type EvaluatedType = Any
 
-  def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) {
-null
-  } else {
-func(a.asInstanceOf[T])
-  }
+  // [[func]] assumes the input is no longer null because eval already 
does the null check.
+  @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = 
func(a.asInstanceOf[T])
 
   // UDFToString
-  def castToString: Any => Any = child.dataType match {
-case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8"))
-case _ => nullOrCast[Any](_, _.toString)
+  private[this] def castToString: Any => Any = child.dataType match {
+case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8"))
+case _ => buildCast[Any](_, _.toString)
   }
 
   // BinaryConverter
-  def castToBinary: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.getBytes("UTF-8"))
+  private[this] def castToBinary: Any => Any = child.dataType match {
+case StringType => buildCast[String](_, _.getBytes("UTF-8"))
   }
 
   // UDFToBoolean
-  def castToBoolean: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.length() != 0)
-case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 
|| b.getNanos() != 0)})
-case LongType => nullOrCast[Long](_, _ != 0)
-case IntegerType => nullOrCast[Int](_, _ != 0)
-case ShortType => nullOrCast[Short](_, _ != 0)
-case ByteType => nullOrCast[Byte](_, _ != 0)
-case DecimalType => nullOrCast[BigDecimal](_, _ != 0)
-case DoubleType => nullOrCast[Double](_, _ != 0)
-case FloatType => nullOrCast[Float](_, _ != 0)
+  private[this] def castToBoolean: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, _.length() != 0)
+case TimestampType =>
+  buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0)
+case LongType =>
+  buildCast[Long](_, _ != 0)
+case IntegerType =>
+  buildCast[Int](_, _ != 0)
+case ShortType =>
+  buildCast[Short](_, _ != 0)
+case ByteType =>
+  buildCast[Byte](_, _ != 0)
+case DecimalType =>
+  buildCast[BigDecimal](_, _ != 0)
+case DoubleType =>
+  buildCast[Double](_, _ != 0)
+case FloatType =>
+  buildCast[Float](_, _ != 0)
   }
 
   // TimestampConverter
-  def castToTimestamp: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => {
-  // Throw away extra if more than 9 decimal places
-  val periodIdx = s.indexOf(".");
-  var n = s
-  if (periodIdx != -1) {
-if (n.length() - periodIdx > 9) {
-  n = n.substring(0, periodIdx + 10)
+  private[this] def castToTimestamp: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, s => {
+// Throw away extra if more than 9 decimal places
+val periodIdx = s.indexOf(".")
+var n = s
+if (periodIdx != -1) {
+  if (n.length() - periodIdx > 9) {
+n = n.substring(0, periodIdx + 10)
+  }
 }
-  }
-  try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException => null}
-})
-case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 
else 0) * 1000))
-case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000))
-case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000))
-case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000))
-case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000))
+try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException => null }
+  })
+case BooleanType =>
+  buildCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 1000))

[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14007468
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -104,85 +121,118 @@ case class Cast(child: Expression, dataType: 
DataType) extends UnaryExpression {
   }
 
   // Timestamp to long, converting milliseconds to seconds
-  private def timestampToLong(ts: Timestamp) = ts.getTime / 1000
+  private[this] def timestampToLong(ts: Timestamp) = ts.getTime / 1000
 
-  private def timestampToDouble(ts: Timestamp) = {
+  private[this] def timestampToDouble(ts: Timestamp) = {
 // First part is the seconds since the beginning of time, followed by 
nanosecs.
 ts.getTime / 1000 + ts.getNanos.toDouble / 10
   }
 
-  def castToLong: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toLong catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1L else 0L)
-case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t))
-case DecimalType => nullOrCast[BigDecimal](_, _.toLong)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toLong(b)
-  }
-
-  def castToInt: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toInt catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
-case TimestampType => nullOrCast[Timestamp](_, t => 
timestampToLong(t).toInt)
-case DecimalType => nullOrCast[BigDecimal](_, _.toInt)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b)
-  }
-
-  def castToShort: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toShort catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toShort else 
0.toShort)
-case TimestampType => nullOrCast[Timestamp](_, t => 
timestampToLong(t).toShort)
-case DecimalType => nullOrCast[BigDecimal](_, _.toShort)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toShort
-  }
-
-  def castToByte: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toByte catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toByte else 
0.toByte)
-case TimestampType => nullOrCast[Timestamp](_, t => 
timestampToLong(t).toByte)
-case DecimalType => nullOrCast[BigDecimal](_, _.toByte)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte
-  }
-
-  def castToDecimal: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try 
BigDecimal(s.toDouble) catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) BigDecimal(1) 
else BigDecimal(0))
+  private[this] def castToLong: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, s => try s.toLong catch {
+case _: NumberFormatException => null
+  })
--- End diff --

Try is really slow though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46646277
  
LGTM. Making everything other than `eval` to be `private[this]` makes 
sense, so that `eval` is guaranteed to be the only entrance of type casting and 
the null check won't be skipped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14007430
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -104,85 +121,118 @@ case class Cast(child: Expression, dataType: 
DataType) extends UnaryExpression {
   }
 
   // Timestamp to long, converting milliseconds to seconds
-  private def timestampToLong(ts: Timestamp) = ts.getTime / 1000
+  private[this] def timestampToLong(ts: Timestamp) = ts.getTime / 1000
 
-  private def timestampToDouble(ts: Timestamp) = {
+  private[this] def timestampToDouble(ts: Timestamp) = {
 // First part is the seconds since the beginning of time, followed by 
nanosecs.
 ts.getTime / 1000 + ts.getNanos.toDouble / 10
   }
 
-  def castToLong: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toLong catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1L else 0L)
-case TimestampType => nullOrCast[Timestamp](_, t => timestampToLong(t))
-case DecimalType => nullOrCast[BigDecimal](_, _.toLong)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toLong(b)
-  }
-
-  def castToInt: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toInt catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1 else 0)
-case TimestampType => nullOrCast[Timestamp](_, t => 
timestampToLong(t).toInt)
-case DecimalType => nullOrCast[BigDecimal](_, _.toInt)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b)
-  }
-
-  def castToShort: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toShort catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toShort else 
0.toShort)
-case TimestampType => nullOrCast[Timestamp](_, t => 
timestampToLong(t).toShort)
-case DecimalType => nullOrCast[BigDecimal](_, _.toShort)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toShort
-  }
-
-  def castToByte: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try s.toByte catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) 1.toByte else 
0.toByte)
-case TimestampType => nullOrCast[Timestamp](_, t => 
timestampToLong(t).toByte)
-case DecimalType => nullOrCast[BigDecimal](_, _.toByte)
-case x: NumericType => b => 
x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte
-  }
-
-  def castToDecimal: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => try 
BigDecimal(s.toDouble) catch {
-  case _: NumberFormatException => null
-})
-case BooleanType => nullOrCast[Boolean](_, b => if(b) BigDecimal(1) 
else BigDecimal(0))
+  private[this] def castToLong: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, s => try s.toLong catch {
+case _: NumberFormatException => null
+  })
--- End diff --

Maybe we can simplify this to:

```scala
Try(s.toLong).getOrElse(null)
```

(`s.toLong` throws `NumerFormatException` only.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14007203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._
 /** Cast the child expression to the target data type. */
 case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression {
   override def foldable = child.foldable
-  def nullable = (child.dataType, dataType) match {
+
+  override def nullable = (child.dataType, dataType) match {
 case (StringType, _: NumericType) => true
 case (StringType, TimestampType)  => true
 case _=> child.nullable
   }
+
   override def toString = s"CAST($child, $dataType)"
 
   type EvaluatedType = Any
 
-  def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) {
-null
-  } else {
-func(a.asInstanceOf[T])
-  }
+  // [[func]] assumes the input is no longer null because eval already 
does the null check.
+  @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = 
func(a.asInstanceOf[T])
 
   // UDFToString
-  def castToString: Any => Any = child.dataType match {
-case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8"))
-case _ => nullOrCast[Any](_, _.toString)
+  private[this] def castToString: Any => Any = child.dataType match {
+case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8"))
+case _ => buildCast[Any](_, _.toString)
   }
 
   // BinaryConverter
-  def castToBinary: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.getBytes("UTF-8"))
+  private[this] def castToBinary: Any => Any = child.dataType match {
+case StringType => buildCast[String](_, _.getBytes("UTF-8"))
   }
 
   // UDFToBoolean
-  def castToBoolean: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.length() != 0)
-case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 
|| b.getNanos() != 0)})
-case LongType => nullOrCast[Long](_, _ != 0)
-case IntegerType => nullOrCast[Int](_, _ != 0)
-case ShortType => nullOrCast[Short](_, _ != 0)
-case ByteType => nullOrCast[Byte](_, _ != 0)
-case DecimalType => nullOrCast[BigDecimal](_, _ != 0)
-case DoubleType => nullOrCast[Double](_, _ != 0)
-case FloatType => nullOrCast[Float](_, _ != 0)
+  private[this] def castToBoolean: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, _.length() != 0)
+case TimestampType =>
+  buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0)
+case LongType =>
+  buildCast[Long](_, _ != 0)
+case IntegerType =>
+  buildCast[Int](_, _ != 0)
+case ShortType =>
+  buildCast[Short](_, _ != 0)
+case ByteType =>
+  buildCast[Byte](_, _ != 0)
+case DecimalType =>
+  buildCast[BigDecimal](_, _ != 0)
+case DoubleType =>
+  buildCast[Double](_, _ != 0)
+case FloatType =>
+  buildCast[Float](_, _ != 0)
   }
 
   // TimestampConverter
-  def castToTimestamp: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => {
-  // Throw away extra if more than 9 decimal places
-  val periodIdx = s.indexOf(".");
-  var n = s
-  if (periodIdx != -1) {
-if (n.length() - periodIdx > 9) {
-  n = n.substring(0, periodIdx + 10)
+  private[this] def castToTimestamp: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, s => {
+// Throw away extra if more than 9 decimal places
+val periodIdx = s.indexOf(".")
+var n = s
+if (periodIdx != -1) {
+  if (n.length() - periodIdx > 9) {
+n = n.substring(0, periodIdx + 10)
+  }
 }
-  }
-  try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException => null}
-})
-case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 
else 0) * 1000))
-case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000))
-case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000))
-case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000))
-case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000))
+try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException => null }
+  })
+case BooleanType =>
+  buildCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 10

[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14006923
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._
 /** Cast the child expression to the target data type. */
 case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression {
   override def foldable = child.foldable
-  def nullable = (child.dataType, dataType) match {
+
+  override def nullable = (child.dataType, dataType) match {
 case (StringType, _: NumericType) => true
 case (StringType, TimestampType)  => true
 case _=> child.nullable
   }
+
   override def toString = s"CAST($child, $dataType)"
 
   type EvaluatedType = Any
 
-  def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) {
-null
-  } else {
-func(a.asInstanceOf[T])
-  }
+  // [[func]] assumes the input is no longer null because eval already 
does the null check.
+  @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = 
func(a.asInstanceOf[T])
 
   // UDFToString
-  def castToString: Any => Any = child.dataType match {
-case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8"))
-case _ => nullOrCast[Any](_, _.toString)
+  private[this] def castToString: Any => Any = child.dataType match {
+case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8"))
+case _ => buildCast[Any](_, _.toString)
   }
 
   // BinaryConverter
-  def castToBinary: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.getBytes("UTF-8"))
+  private[this] def castToBinary: Any => Any = child.dataType match {
+case StringType => buildCast[String](_, _.getBytes("UTF-8"))
   }
 
   // UDFToBoolean
-  def castToBoolean: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.length() != 0)
-case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 
|| b.getNanos() != 0)})
-case LongType => nullOrCast[Long](_, _ != 0)
-case IntegerType => nullOrCast[Int](_, _ != 0)
-case ShortType => nullOrCast[Short](_, _ != 0)
-case ByteType => nullOrCast[Byte](_, _ != 0)
-case DecimalType => nullOrCast[BigDecimal](_, _ != 0)
-case DoubleType => nullOrCast[Double](_, _ != 0)
-case FloatType => nullOrCast[Float](_, _ != 0)
+  private[this] def castToBoolean: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, _.length() != 0)
+case TimestampType =>
+  buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0)
+case LongType =>
+  buildCast[Long](_, _ != 0)
+case IntegerType =>
+  buildCast[Int](_, _ != 0)
+case ShortType =>
+  buildCast[Short](_, _ != 0)
+case ByteType =>
+  buildCast[Byte](_, _ != 0)
+case DecimalType =>
+  buildCast[BigDecimal](_, _ != 0)
+case DoubleType =>
+  buildCast[Double](_, _ != 0)
+case FloatType =>
+  buildCast[Float](_, _ != 0)
   }
 
   // TimestampConverter
-  def castToTimestamp: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => {
-  // Throw away extra if more than 9 decimal places
-  val periodIdx = s.indexOf(".");
-  var n = s
-  if (periodIdx != -1) {
-if (n.length() - periodIdx > 9) {
-  n = n.substring(0, periodIdx + 10)
+  private[this] def castToTimestamp: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, s => {
+// Throw away extra if more than 9 decimal places
+val periodIdx = s.indexOf(".")
+var n = s
+if (periodIdx != -1) {
+  if (n.length() - periodIdx > 9) {
+n = n.substring(0, periodIdx + 10)
+  }
 }
-  }
-  try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException => null}
-})
-case BooleanType => nullOrCast[Boolean](_, b => new Timestamp((if(b) 1 
else 0) * 1000))
-case LongType => nullOrCast[Long](_, l => new Timestamp(l * 1000))
-case IntegerType => nullOrCast[Int](_, i => new Timestamp(i * 1000))
-case ShortType => nullOrCast[Short](_, s => new Timestamp(s * 1000))
-case ByteType => nullOrCast[Byte](_, b => new Timestamp(b * 1000))
+try Timestamp.valueOf(n) catch { case _: 
java.lang.IllegalArgumentException => null }
+  })
+case BooleanType =>
+  buildCast[Boolean](_, b => new Timestamp((if(b) 1 else 0) * 10

[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1143#discussion_r14006911
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -24,72 +24,89 @@ import org.apache.spark.sql.catalyst.types._
 /** Cast the child expression to the target data type. */
 case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression {
   override def foldable = child.foldable
-  def nullable = (child.dataType, dataType) match {
+
+  override def nullable = (child.dataType, dataType) match {
 case (StringType, _: NumericType) => true
 case (StringType, TimestampType)  => true
 case _=> child.nullable
   }
+
   override def toString = s"CAST($child, $dataType)"
 
   type EvaluatedType = Any
 
-  def nullOrCast[T](a: Any, func: T => Any): Any = if(a == null) {
-null
-  } else {
-func(a.asInstanceOf[T])
-  }
+  // [[func]] assumes the input is no longer null because eval already 
does the null check.
+  @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = 
func(a.asInstanceOf[T])
 
   // UDFToString
-  def castToString: Any => Any = child.dataType match {
-case BinaryType => nullOrCast[Array[Byte]](_, new String(_, "UTF-8"))
-case _ => nullOrCast[Any](_, _.toString)
+  private[this] def castToString: Any => Any = child.dataType match {
+case BinaryType => buildCast[Array[Byte]](_, new String(_, "UTF-8"))
+case _ => buildCast[Any](_, _.toString)
   }
 
   // BinaryConverter
-  def castToBinary: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.getBytes("UTF-8"))
+  private[this] def castToBinary: Any => Any = child.dataType match {
+case StringType => buildCast[String](_, _.getBytes("UTF-8"))
   }
 
   // UDFToBoolean
-  def castToBoolean: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, _.length() != 0)
-case TimestampType => nullOrCast[Timestamp](_, b => {(b.getTime() != 0 
|| b.getNanos() != 0)})
-case LongType => nullOrCast[Long](_, _ != 0)
-case IntegerType => nullOrCast[Int](_, _ != 0)
-case ShortType => nullOrCast[Short](_, _ != 0)
-case ByteType => nullOrCast[Byte](_, _ != 0)
-case DecimalType => nullOrCast[BigDecimal](_, _ != 0)
-case DoubleType => nullOrCast[Double](_, _ != 0)
-case FloatType => nullOrCast[Float](_, _ != 0)
+  private[this] def castToBoolean: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, _.length() != 0)
+case TimestampType =>
+  buildCast[Timestamp](_, b => b.getTime() != 0 || b.getNanos() != 0)
+case LongType =>
+  buildCast[Long](_, _ != 0)
+case IntegerType =>
+  buildCast[Int](_, _ != 0)
+case ShortType =>
+  buildCast[Short](_, _ != 0)
+case ByteType =>
+  buildCast[Byte](_, _ != 0)
+case DecimalType =>
+  buildCast[BigDecimal](_, _ != 0)
+case DoubleType =>
+  buildCast[Double](_, _ != 0)
+case FloatType =>
+  buildCast[Float](_, _ != 0)
   }
 
   // TimestampConverter
-  def castToTimestamp: Any => Any = child.dataType match {
-case StringType => nullOrCast[String](_, s => {
-  // Throw away extra if more than 9 decimal places
-  val periodIdx = s.indexOf(".");
-  var n = s
-  if (periodIdx != -1) {
-if (n.length() - periodIdx > 9) {
-  n = n.substring(0, periodIdx + 10)
+  private[this] def castToTimestamp: Any => Any = child.dataType match {
+case StringType =>
+  buildCast[String](_, s => {
+// Throw away extra if more than 9 decimal places
+val periodIdx = s.indexOf(".")
+var n = s
+if (periodIdx != -1) {
+  if (n.length() - periodIdx > 9) {
--- End diff --

How about merging these two `if` statements into 1 with `&&`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46643095
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46643096
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15937/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46640182
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1143#issuecomment-46640185
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2209][SQL] Cast shouldn't do null check...

2014-06-19 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1143

[SPARK-2209][SQL] Cast shouldn't do null check twice.

Also took the chance to clean up cast a little bit. Too many arrows on each 
line before!


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark cast

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1143.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1143


commit c2b88aee347edab3d36475ef75b30a1d2f15b1c1
Author: Reynold Xin 
Date:   2014-06-20T02:43:06Z

[SPARK-2209][SQL] Cast shouldn't do null check twice.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---