Just wanted to confirm that this is the expected behaviour. Basically I'm putting nulls into a non-nullable LongType column and doing a transformation operation on that column, the result is a column with nulls converted to 0.
Heres an example from pyspark.sql import types from pyspark.sql import DataFrame, types, functions as F sql_schema = types.StructType([ types.StructField("a", types.LongType(), True), types.StructField("b", types.StringType(), True), ]) df = sqlCtx.createDataFrame([ (1, "one"), (None, "two"), ], sql_schema) *# Everything is fine here* *df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]* def assert_not_null(val): return val udf = F.udf(assert_not_null, types.LongType()) df = df.withColumnRenamed('a', "_tmp_col") df = df.withColumn('a', udf(df._tmp_col)) df = df.drop("_tmp_col") *# None gets converted to 0* *df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]* Thanks, Franklyn