[ https://issues.apache.org/jira/browse/SPARK-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanna Saraswathi Krishnan updated SPARK-29432: ------------------------------------------------- Description: When I add a new column to a dataframe with {{withColumn}} function, by default, the column is added with {{nullable=false}}. But, when I save the dataframe, the flag changes to {{nullable=true}}. Is this the expected behavior? why? {{>>> l = [('Alice', 1)]}} {{>>> df = spark.createDataFrame(l)}} {{>>> df.printSchema()}} {{root}} |-- _1: string (nullable = true) |-- _2: long (nullable = true) {{>>> from pyspark.sql.functions import lit}} {{>>> df = df.withColumn('newCol', lit('newVal'))}} {{>>> df.printSchema()}} {{root}} \{{ |-- _1: string (nullable = true)}} \{{ |-- _2: long (nullable = true)}} \{{ |-- newCol: string (nullable = false)}}{{>>> spark.sql("select * from default.withcolTest").printSchema()}} {{root}} \{{ |-- _1: string (nullable = true)}} \{{ |-- _2: long (nullable = true)}} \{{ |-- newCol: string (nullable = true)}} was: When I add a new column to a dataframe with {{withColumn}} function, by default, the column is added with {{nullable=false}}. But, when I save the dataframe, the flag changes to {{nullable=true}}. Is this the expected behavior? why? {{>>> l = [('Alice', 1)]}} {{>>> df = spark.createDataFrame(l)}} {{>>> df.printSchema()}} {{root}} {{ |-- _1: string (nullable = true)}} {{ |-- _2: long (nullable = true)}}{{>>> from pyspark.sql.functions import lit}} {{>>> df = df.withColumn('newCol', lit('newVal'))}} {{>>> df.printSchema()}} {{root}} {{ |-- _1: string (nullable = true)}} {{ |-- _2: long (nullable = true)}} {{ |-- newCol: string (nullable = false)}}{{>>> spark.sql("select * from default.withcolTest").printSchema()}} {{root}} {{ |-- _1: string (nullable = true)}} {{ |-- _2: long (nullable = true)}} {{ |-- newCol: string (nullable = true)}} > nullable flag of new column changes when persisting a pyspark dataframe > ----------------------------------------------------------------------- > > Key: SPARK-29432 > URL: https://issues.apache.org/jira/browse/SPARK-29432 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.4.0 > Environment: Spark 2.4.0-cdh6.1.1 (Cloudera distribution) > Python 3.7.3 > Reporter: Prasanna Saraswathi Krishnan > Priority: Minor > > When I add a new column to a dataframe with {{withColumn}} function, by > default, the column is added with {{nullable=false}}. > But, when I save the dataframe, the flag changes to {{nullable=true}}. Is > this the expected behavior? why? > > {{>>> l = [('Alice', 1)]}} > {{>>> df = spark.createDataFrame(l)}} > {{>>> df.printSchema()}} > {{root}} > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > {{>>> from pyspark.sql.functions import lit}} > {{>>> df = df.withColumn('newCol', lit('newVal'))}} > {{>>> df.printSchema()}} > {{root}} > \{{ |-- _1: string (nullable = true)}} > \{{ |-- _2: long (nullable = true)}} > \{{ |-- newCol: string (nullable = false)}}{{>>> spark.sql("select * from > default.withcolTest").printSchema()}} > {{root}} > \{{ |-- _1: string (nullable = true)}} > \{{ |-- _2: long (nullable = true)}} > \{{ |-- newCol: string (nullable = true)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org