Just wanted to confirm that this is the expected behaviour.
Basically I'm putting nulls into a non-nullable LongType column and doing a
transformation operation on that column, the result is a column with nulls
converted to 0.
Heres an example
from pyspark.sql import types
from pyspark.sql import DataFrame, types, functions as F
sql_schema = types.StructType([
types.StructField("a", types.LongType(), True),
types.StructField("b", types.StringType(), True),
])
df = sqlCtx.createDataFrame([
(1, "one"),
(None, "two"),
], sql_schema)
*# Everything is fine here*
*df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]*
def assert_not_null(val):
return val
udf = F.udf(assert_not_null, types.LongType())
df = df.withColumnRenamed('a', "_tmp_col")
df = df.withColumn('a', udf(df._tmp_col))
df = df.drop("_tmp_col")
*# None gets converted to 0*
*df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]*
Thanks,
Franklyn