[ https://issues.apache.org/jira/browse/SPARK-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix Cheung resolved SPARK-19342. ---------------------------------- Resolution: Fixed > Datatype tImestamp is converted to numeric in collect method > ------------------------------------------------------------- > > Key: SPARK-19342 > URL: https://issues.apache.org/jira/browse/SPARK-19342 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 2.1.0 > Reporter: Fangzhou Yang > Assignee: Fangzhou Yang > Fix For: 2.1.1, 2.2.0 > > > Get double instead of POSIX in collect method for timestamp column datatype, > when NA exists at the top of the column. > The following codes and outputs show that, how the bug can be reproduced: > {code} > > sparkR.session(master = "local") > Spark package found in SPARK_HOME: /home/titicaca/spark-2.1 > Launching java with spark-submit command > /home/titicaca/spark-2.1/bin/spark-submit sparkr-shell > /tmp/RtmpqmpZUg/backend_port363a898be92 > Java ref type org.apache.spark.sql.SparkSession id 1 > > df <- data.frame(col1 = c(0, 1, 2), > + col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, > as.POSIXct("2017-01-01 12:00:01"))) > > sdf1 <- createDataFrame(df) > > print(dtypes(sdf1)) > [[1]] > [1] "col1" "double" > [[2]] > [1] "col2" "timestamp" > > df1 <- collect(sdf1) > > print(lapply(df1, class)) > $col1 > [1] "numeric" > $col2 > [1] "POSIXct" "POSIXt" > > sdf2 <- filter(sdf1, "col1 > 0") > > print(dtypes(sdf2)) > [[1]] > [1] "col1" "double" > [[2]] > [1] "col2" "timestamp" > > df2 <- collect(sdf2) > > print(lapply(df2, class)) > $col1 > [1] "numeric" > $col2 > [1] "numeric" > {code} > As we can see, the data type of col2 is converted to numberic unexpectedly in > the collected local data frame df2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org