Hi Sivakumar, I have run into this issue in the past, and we were able to fix it by using an explicit schema when saving the DataFrame to the Avro file. This schema was an exact match to the one associated with the metadata on the Hive database table, which allowed the Hive queries to work even after updating the underlying Avro file via Spark.
We are using Spark 1.3.0, and I was hoping to find a better solution to this problem once we upgrade to Spark 1.5.0 (we manage versions via CDH). This one works, but the coding involved can be a little tedious based on the complexity of your data. If memory serves correctly, the explicit schema was necessary because our data structure contained optional nested properties. The DataFrame writer will automatically create a schema for you, but ours was differing based on the data being saved (i.e. whether it did or did not contain a nested element). - Kevin On Wed, Jan 13, 2016 at 7:20 PM, Siva <sbhavan...@gmail.com> wrote: > Hi Everyone, > > Avro data written by dataframe in hdfs in not able to read by hive. Saving > data avro format with below statement. > > df.save("com.databricks.spark.avro", SaveMode.Append, Map("path" -> path)) > > Created hive avro external table and while reading I see all nulls. Did > anyone face similar issue, what is the best way to write the data in avro > format from spark, so that it can also readable by hive. > > Thanks, > Sivakumar Bhavanari. >