Re: How to change a DataFrame column from nullable to not nullable in PySpark

2021-10-15 Thread ashok34...@yahoo.com.INVALID
Many thanks all, especially to Mich. That is what I was looking for. On Friday, 15 October 2021, 09:28:24 BST, Mich Talebzadeh wrote: Spark allows one to define the column format as StructType or list. By default Spark assumes that all fields are nullable when creating a dataframe. To

Re: How to change a DataFrame column from nullable to not nullable in PySpark

2021-10-15 Thread Mich Talebzadeh
Spark allows one to define the column format as StructType or list. By default Spark assumes that all fields are nullable when creating a dataframe. To change nullability you need to provide the structure of the columns. Assume that I have created an RDD in the form rdd = sc.parallelize(Range).

Re: How to change a DataFrame column from nullable to not nullable in PySpark

2021-10-14 Thread Sonal Goyal
I see some nice answers at https://stackoverflow.com/questions/46072411/can-i-change-the-nullability-of-a-column-in-my-spark-dataframe On Thu, 14 Oct 2021 at 5:21 PM, ashok34...@yahoo.com.INVALID wrote: > Gurus, > > I have an RDD in PySpark that I can convert to DF through > > df = rdd.toDF() >

How to change a DataFrame column from nullable to not nullable in PySpark

2021-10-14 Thread ashok34...@yahoo.com.INVALID
Gurus, I have an RDD in PySpark that I can convert to DF through df = rdd.toDF() However, when I do df.printSchema() I see the columns as nullable. = true by default root |-- COL-1: long (nullable = true) |-- COl-2: double (nullable = true) |-- COl-3: string (nullable = true) What would be the