Is there a way to check nested column exists from Schema in PySpark? http://stackoverflow.com/questions/37471346/automatically-and-elegantly-flatten-dataframe-in-spark-sql shows how to get the list of nested columns in Scala. But, can this be done in PySpark?
Please help. On Mon, Sep 12, 2016 at 5:28 PM, Arun Patel <arunp.bigd...@gmail.com> wrote: > I'm trying to analyze XML documents using spark-xml package. Since all > XML columns are optional, some columns may or may not exist. When I > register the Dataframe as a table, how do I check if a nested column is > existing or not? My column name is "emp" which is already exploded and I am > trying to check if the nested column "emp.mgr.col" exists or not. If it > exists, I need to use it. If it does not exist, I should set it to null. > Is there a way to achieve this? > > Please note I am not able to use .columns method because it does not show > the nested columns. > > Also, note that I cannot manually specify the schema because of my > requirement. > > I'm trying this in Pyspark. > > Thank you. >