That looks like a bug to me. Open a JIRA? On Mon, Mar 7, 2016 at 11:30 AM, Franklyn D'souza < franklyn.dso...@shopify.com> wrote:
> Just wanted to confirm that this is the expected behaviour. > > Basically I'm putting nulls into a non-nullable LongType column and doing > a transformation operation on that column, the result is a column with > nulls converted to 0. > > Heres an example > > from pyspark.sql import types > from pyspark.sql import DataFrame, types, functions as F > > sql_schema = types.StructType([ > types.StructField("a", types.LongType(), True), > types.StructField("b", types.StringType(), True), > ]) > > df = sqlCtx.createDataFrame([ > (1, "one"), > (None, "two"), > ], sql_schema) > > *# Everything is fine here* > *df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]* > > def assert_not_null(val): > return val > > udf = F.udf(assert_not_null, types.LongType()) > > df = df.withColumnRenamed('a', "_tmp_col") > df = df.withColumn('a', udf(df._tmp_col)) > df = df.drop("_tmp_col") > > *# None gets converted to 0* > *df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]* > > Thanks, > > Franklyn > >