Travis Crawford created SPARK-14081:
---------------------------------------

             Summary: DataFrameNaFunctions fill should not convert float fields 
to double
                 Key: SPARK-14081
                 URL: https://issues.apache.org/jira/browse/SPARK-14081
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.1
            Reporter: Travis Crawford


[DataFrameNaFunctions|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala]
 provides useful function for dealing with null values in a DataFrame. 
Currently it changes FloatType columns to DoubleType when zero filling. Spark 
should preserve the column data type.

In the following example, notice how `zeroFilledDF` has its `floatField` 
converted from float to double.

{code}
scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val schema = StructType(Seq(
  StructField("intField", IntegerType),
  StructField("longField", LongType),
  StructField("floatField", FloatType),
  StructField("doubleField", DoubleType)))

val rdd = sc.parallelize(Seq(Row(1,1L,1f,1d), Row(null,null,null,null)))

val df = sqlContext.createDataFrame(rdd, schema)

val zeroFilledDF = df.na.fill(0)

// Exiting paste mode, now interpreting.

import org.apache.spark.sql._
import org.apache.spark.sql.types._
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(intField,IntegerType,true), 
StructField(longField,LongType,true), StructField(floatField,FloatType,true), 
StructField(doubleField,DoubleType,true))
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[2] at parallelize at <console>:48
df: org.apache.spark.sql.DataFrame = [intField: int, longField: bigint, 
floatField: float, doubleField: double]
zeroFilledDF: org.apache.spark.sql.DataFrame = [intField: int, longField: 
bigint, floatField: double, doubleField: double]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to