Hi,

I save Parquet files in a partitioned table, so in /path/to/table/myfield=a/ .
But I also kept the field "myfield" in the Parquet data. Thus. when I query the 
field, I get this error:


df.select("myfield").show(10)
"Exception in thread "main" org.apache.spark.sql.AnalysisException: Ambiguous 
references to myfield  (myfield#2,List()),(myfield#47,List());"


Looking at the code, I could not find a way to explicitly specify which column 
I'd want. DataFrame#columns returns strings. Even by loading the data with a 
schema (StructType), I'm not sure I can do it.


Should I have to make sure that my partition field does not exist in the data 
before saving ? Or is there a way to declare what column in the schema I want 
to query ?


Thanks.




Reply via email to