Hello,

I specified a StructType like this: 

*val mySchema = StructType(Array(StructField("f1", StringType,
true),StructField("f2", StringType, true)))*

I have many ORC files stored in HDFS location:*
/user/hos/orc_files_test_together
*

These files use different schema : some of them have only f1 columns and
other have both f1 and f2 columns. 

I read the data from these files to a dataframe:
*val df =
spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_test_together")*

But, now when I give the following command to see the data, it fails:
*df.show*

The error message is like "f2" comun doesn't exist. 

Since I have specified nullable attribute as true for f2 column, why it
fails?

Or, is there any way to specify default vaule for StructField?

Because, in AVRO schema, we can specify the default value in this way and
can read AVRO files in a folder which have 2 different schemas (either only
f1 column or both f1 and f2 columns): 

*{
   "type": "record",
   "name": "myrecord",
   "fields": 
   [
      {
         "name": "f1",
         "type": "string",
         "default": ""
      },
      {
         "name": "f2",
         "type": "string",
         "default": ""
      }
   ]
}*

Wondering why it doesn't work with ORC files.

thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-default-value-for-StructField-tp28386.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to