Hi,

I am trying to write data from Spark to hive partitioned table. The job is
running without any error, but it is not writing the data to correct
location.

job-executor-0] parquet.ParquetRelation (Logging.scala:logInfo(59)) -
Listing 
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_000001/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
on driver
2016-01-23 07:58:53,223 INFO  [streaming-job-executor-0]
parquet.ParquetRelation (Logging.scala:logInfo(59)) - Listing
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_000001/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
on driver*2016-01-23 07:58:53,276 WARN  [streaming-job-executor-0]
hive.HiveContext$$anon$1 (Logging.scala:logWarning(71)) - Persisting
partitioned data source relation `CASE_LOGS` into Hive metastore in
Spark SQL specific format, which is NOT compatible with Hive. Input
path(s): *
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_000001/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
2016-01-23 07:58:53,454 INFO  [streaming-job-executor-0]
log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG
method=create_table_with_environment_context
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>654 INFO
[JobScheduler] scheduler.JobScheduler (Logging.scala:logInfo(59)) -
Finished job streaming job 1453564710000 ms.0 from job set of time
1453564710000 ms


Its not writing data in Spark SQL format instead of Hive format. Can
anybody tell me how to get rid of this issue?

Spark version - 1.5.0
CDH 5.5.1

Thanks,
Akhilesh Pathodia

Reply via email to