Hi, I am trying to write data from spark to Hive partitioned table:
DataFrame dataFrame = sqlContext.createDataFrame(rdd, schema); dataFrame.write().partitionBy("YEAR","MONTH","DAY").saveAsTable(tableName); The data is not being written to hive table (hdfs location: /user/hive/warehouse/<table_name>/), Below are the logs from spark executor. As shown in the logs, it is writing the data to /tmp/spark-a3c7ed0f-76c6-4c3c-b80c-0734e33390a2/metastore/case_logs, but I did not find this directory in HDFS. 16/01/23 02:15:03 INFO datasources.DynamicPartitionWriterContainer: Sorting complete. Writing out partition files one at a time. 16/01/23 02:15:03 INFO compress.CodecPool: Got brand-new compressor [.gz] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/parquet-pig-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/parquet-hadoop-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/parquet-format-2.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/hive-exec-1.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/hive-jdbc-1.1.0-cdh5.5.1-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz] 16/01/23 02:15:06 INFO output.FileOutputCommitter: Saved output of task 'attempt_201601230214_0023_m_000000_0' to file:/tmp/spark-a3c7ed0f-76c6-4c3c-b80c-0734e33390a2/metastore/case_logs 16/01/23 02:15:06 INFO mapred.SparkHadoopMapRedUtil: attempt_201601230214_0023_m_000000_0: Committed 16/01/23 02:15:06 INFO executor.Executor: Finished task 0.0 in stage 23.0 (TID 23). 2013 bytes result sent to driver I am using CDH 5.5.1 an Spark 1.5.0. Does anybody have idea what is happening here? Thanks, Akhilesh