Check if a nested column exists in DataFrame

Arun Patel Mon, 12 Sep 2016 14:29:37 -0700

I'm trying to analyze XML documents using spark-xml package.  Since all XML
columns are optional, some columns may or may not exist. When I register
the Dataframe as a table, how do I check if a nested column is existing or
not? My column name is "emp" which is already exploded and I am trying to
check if the nested column "emp.mgr.col" exists or not.  If it exists, I
need to use it.  If it does not exist, I should set it to null.  Is there a
way to achieve this?


Please note I am not able to use .columns method because it does not show
the nested columns.

Also, note that I  cannot manually specify the schema because of my
requirement.

I'm trying this in Pyspark.

Thank you.

Check if a nested column exists in DataFrame

Reply via email to