Hi,
I save Parquet files in a partitioned table, so in /path/to/table/myfield=a/ . But I also kept the field "myfield" in the Parquet data. Thus. when I query the field, I get this error: df.select("myfield").show(10) "Exception in thread "main" org.apache.spark.sql.AnalysisException: Ambiguous references to myfield (myfield#2,List()),(myfield#47,List());" Looking at the code, I could not find a way to explicitly specify which column I'd want. DataFrame#columns returns strings. Even by loading the data with a schema (StructType), I'm not sure I can do it. Should I have to make sure that my partition field does not exist in the data before saving ? Or is there a way to declare what column in the schema I want to query ? Thanks.