[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

anabranch Sat, 07 Jan 2017 20:57:07 -0800

Github user anabranch commented on the issue:

    https://github.com/apache/spark/pull/16138
  
    I believe now why my previous implementation did not work.
    
    My implementation originally looked like this:
    
    ```scala
    case class ParseToTimestamp(left: Expression, format: Expression, child: 
Expression)
      extends RuntimeReplaceable {
    
      def this(left: Expression, format: Expression) = {
      this(left, format, Cast(UnixTimestamp(left, format), TimestampType))
    }
    
      override def checkInputDataTypes(): TypeCheckResult = {
        if (left.dataType != StringType) {
          TypeCheckResult.TypeCheckFailure(s"TO_TIMESTAMP requires both inputs 
to be strings")
        }
        TypeCheckResult.TypeCheckSuccess
      }
    
      override def flatArguments: Iterator[Any] = Iterator(left, format)
      override def sql: String = s"$prettyName(${left.sql}, ${format.sql})"
    
      override def prettyName: String = "to_timestamp"
      override def dataType: DataType = TimestampType
    }
    ```
    
    This implementation with a simple example would fail.
    
    ```scala
    import org.apache.spark.sql.functions._
    
    val ss1 = "2015-07-24 10:00:00"
    val ss2 = "2015-07-25 02:02:02"
    val df2 = Seq((ss1), (ss2)).toDF("ss")
    
    df2.select(to_timestamp(col("ss"))).show
    ```
    This throws a 
    
    ```
    org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
dataType on unresolved object, tree: 'ss
    ```
    
    A `Trace` log level shows that the columns are resolved however the error 
originates after analysis and during `CheckInputDataTypes`. This function seeks 
to analyze the `left` input, however because this column is input into a 
`RuntimeReplaceable` function the relevant, and resolved, tree is actually the 
`child` argument - `left` remains unresolved (and therefore throws the above 
error).
    
    I believe this to be the root cause and that has in turn showed me that I 
do not need to perform input validation for this function in the first place. 
Since I only wrap functions, they are performing the exact same input 
validation that I would be. Since no new logic is implemented, there's no point 
in redundantly validating something that will be validated again anyways, 
especially when the system won't let me.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

Reply via email to