[ https://issues.apache.org/jira/browse/SPARK-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-29432: --------------------------------- Description: When I add a new column to a dataframe with {{withColumn}} function, by default, the column is added with {{nullable=false}}. But, when I save the dataframe, the flag changes to {{nullable=true}}. Is this the expected behavior? why? {code} >>> l = [('Alice', 1)] >>> df = spark.createDataFrame(l) >>> df.printSchema() root |-- _1: string (nullable = true) |-- _2: long (nullable = true) {code} {code} >>> from pyspark.sql.functions import lit >>> df = df.withColumn('newCol', lit('newVal')) >>> df.printSchema() root |-- _1: string (nullable = true) |-- _2: long (nullable = true) |-- newCol: string (nullable = false) >>> spark.sql("select * from default.withcolTest").printSchema() |-- _1: string (nullable = true) |-- _2: long (nullable = true) |-- newCol: string (nullable = true) {code} was: When I add a new column to a dataframe with {{withColumn}} function, by default, the column is added with {{nullable=false}}. But, when I save the dataframe, the flag changes to {{nullable=true}}. Is this the expected behavior? why? {{>>> l = [('Alice', 1)]}} {{>>> df = spark.createDataFrame(l)}} {{>>> df.printSchema()}} {{root}} \{{ |-- _1: string (nullable = true)}} \{{ |-- _2: long (nullable = true)}} {{>>> from pyspark.sql.functions import lit}} {{>>> df = df.withColumn('newCol', lit('newVal'))}} {{>>> df.printSchema()}} {{root}} \{{ |-- _1: string (nullable = true)}} \{{ |-- _2: long (nullable = true)}} \{{ |-- newCol: string (nullable = false)}}{{>>> spark.sql("select * from default.withcolTest").printSchema()}} {{root}} \{{ |-- _1: string (nullable = true)}} \{{ |-- _2: long (nullable = true)}} \{{ |-- newCol: string (nullable = true)}} > nullable flag of new column changes when persisting a pyspark dataframe > ----------------------------------------------------------------------- > > Key: SPARK-29432 > URL: https://issues.apache.org/jira/browse/SPARK-29432 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.4.0 > Environment: Spark 2.4.0-cdh6.1.1 (Cloudera distribution) > Python 3.7.3 > Reporter: Prasanna Saraswathi Krishnan > Priority: Minor > > When I add a new column to a dataframe with {{withColumn}} function, by > default, the column is added with {{nullable=false}}. > But, when I save the dataframe, the flag changes to {{nullable=true}}. Is > this the expected behavior? why? > > {code} > >>> l = [('Alice', 1)] > >>> df = spark.createDataFrame(l) > >>> df.printSchema() > root > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > {code} > {code} > >>> from pyspark.sql.functions import lit > >>> df = df.withColumn('newCol', lit('newVal')) > >>> df.printSchema() > root > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > |-- newCol: string (nullable = false) > >>> spark.sql("select * from default.withcolTest").printSchema() > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > |-- newCol: string (nullable = true) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org