You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show
在 2017/2/14 上午6:54,“vbegar”<user-return-67879-zjp_jdev=163....@spark.apache.org 代表 veena.be...@hpe.com> 写入: >Hello, > >I specified a StructType like this: > >*val mySchema = StructType(Array(StructField("f1", StringType, >true),StructField("f2", StringType, true)))* > >I have many ORC files stored in HDFS location:* >/user/hos/orc_files_test_together >* > >These files use different schema : some of them have only f1 columns and >other have both f1 and f2 columns. > >I read the data from these files to a dataframe: >*val df = >spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_test_together")* > >But, now when I give the following command to see the data, it fails: >*df.show* > >The error message is like "f2" comun doesn't exist. > >Since I have specified nullable attribute as true for f2 column, why it >fails? > >Or, is there any way to specify default vaule for StructField? > >Because, in AVRO schema, we can specify the default value in this way and >can read AVRO files in a folder which have 2 different schemas (either only >f1 column or both f1 and f2 columns): > >*{ > "type": "record", > "name": "myrecord", > "fields": > [ > { > "name": "f1", > "type": "string", > "default": "" > }, > { > "name": "f2", > "type": "string", > "default": "" > } > ] >}* > >Wondering why it doesn't work with ORC files. > >thanks. > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-default-value-for-StructField-tp28386.html >Sent from the Apache Spark User List mailing list archive at Nabble.com. > >--------------------------------------------------------------------- >To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org