That looks like a bug to me.  Open a JIRA?

On Mon, Mar 7, 2016 at 11:30 AM, Franklyn D'souza <
franklyn.dso...@shopify.com> wrote:

> Just wanted to confirm that this is the expected behaviour.
>
> Basically I'm putting nulls into a non-nullable LongType column and doing
> a transformation operation on that column, the result is a column with
> nulls converted to 0.
>
> Heres an example
>
> from pyspark.sql import types
> from pyspark.sql import DataFrame, types, functions as F
>
> sql_schema = types.StructType([
>   types.StructField("a", types.LongType(), True),
>   types.StructField("b", types.StringType(),  True),
> ])
>
> df = sqlCtx.createDataFrame([
>     (1, "one"),
>     (None, "two"),
> ], sql_schema)
>
> *# Everything is fine here*
> *df.collect() # [Row(a=1, b=u'one'), Row(a=None, b=u'two')]*
>
> def assert_not_null(val):
>     return val
>
> udf = F.udf(assert_not_null, types.LongType())
>
> df = df.withColumnRenamed('a', "_tmp_col")
> df = df.withColumn('a', udf(df._tmp_col))
> df = df.drop("_tmp_col")
>
> *# None gets converted to 0*
> *df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]*
>
> Thanks,
>
> Franklyn
>
>

Reply via email to