[ https://issues.apache.org/jira/browse/SPARK-29432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-29432. ---------------------------------- Resolution: Cannot Reproduce Can't fine {{withcolTest}} table. Also, please ask questions into mailing list. You could have a better answer. > nullable flag of new column changes when persisting a pyspark dataframe > ----------------------------------------------------------------------- > > Key: SPARK-29432 > URL: https://issues.apache.org/jira/browse/SPARK-29432 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.4.0 > Environment: Spark 2.4.0-cdh6.1.1 (Cloudera distribution) > Python 3.7.3 > Reporter: Prasanna Saraswathi Krishnan > Priority: Minor > > When I add a new column to a dataframe with {{withColumn}} function, by > default, the column is added with {{nullable=false}}. > But, when I save the dataframe, the flag changes to {{nullable=true}}. Is > this the expected behavior? why? > > {code} > >>> l = [('Alice', 1)] > >>> df = spark.createDataFrame(l) > >>> df.printSchema() > root > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > {code} > {code} > >>> from pyspark.sql.functions import lit > >>> df = df.withColumn('newCol', lit('newVal')) > >>> df.printSchema() > root > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > |-- newCol: string (nullable = false) > >>> spark.sql("select * from default.withcolTest").printSchema() > |-- _1: string (nullable = true) > |-- _2: long (nullable = true) > |-- newCol: string (nullable = true) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org