[ https://issues.apache.org/jira/browse/SPARK-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklyn Dsouza updated SPARK-13730: ------------------------------------ Description: Basically I'm putting nulls into a non-nullable LongType column and doing a transformation operation on that column, the result is a column with nulls converted to 0. I haven't tested this on 1.6.1 or in Scala. Heres an example {code} from pyspark.sql import types from pyspark.sql import DataFrame, types, functions as F sql_schema = types.StructType([ types.StructField("a", types.LongType(), True), types.StructField("b", types.StringType(), True), ]) df = sqlCtx.createDataFrame([ (1, "one"), (None, "two"), ], sql_schema) # Everything is fine here df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')] def assert_not_null(val): return val udf = F.udf(assert_not_null, types.LongType()) df = df.withColumnRenamed('a', "_tmp_col") df = df.withColumn('a', udf(df._tmp_col)) df = df.drop("_tmp_col") # None gets converted to 0 df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)] {code} was: Basically I'm putting nulls into a non-nullable LongType column and doing a transformation operation on that column, the result is a column with nulls converted to 0. I haven't tested this on 1.6.1 or in Scala. {code} from pyspark.sql import types from pyspark.sql import DataFrame, types, functions as F sql_schema = types.StructType([ types.StructField("a", types.LongType(), True), types.StructField("b", types.StringType(), True), ]) df = sqlCtx.createDataFrame([ (1, "one"), (None, "two"), ], sql_schema) # Everything is fine here df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')] def assert_not_null(val): return val udf = F.udf(assert_not_null, types.LongType()) df = df.withColumnRenamed('a', "_tmp_col") df = df.withColumn('a', udf(df._tmp_col)) df = df.drop("_tmp_col") # None gets converted to 0 df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)] {code} > Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT > ------------------------------------------------------------------ > > Key: SPARK-13730 > URL: https://issues.apache.org/jira/browse/SPARK-13730 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 2.0.0 > Reporter: Franklyn Dsouza > Priority: Critical > > Basically I'm putting nulls into a non-nullable LongType column and doing a > transformation operation on that column, the result is a column with nulls > converted to 0. > I haven't tested this on 1.6.1 or in Scala. > Heres an example > {code} > from pyspark.sql import types > from pyspark.sql import DataFrame, types, functions as F > sql_schema = types.StructType([ > types.StructField("a", types.LongType(), True), > types.StructField("b", types.StringType(), True), > ]) > df = sqlCtx.createDataFrame([ > (1, "one"), > (None, "two"), > ], sql_schema) > # Everything is fine here > df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')] > def assert_not_null(val): > return val > udf = F.udf(assert_not_null, types.LongType()) > df = df.withColumnRenamed('a', "_tmp_col") > df = df.withColumn('a', udf(df._tmp_col)) > df = df.drop("_tmp_col") > # None gets converted to 0 > df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org