Hi, Hive table creation need an extra step from 1.3. You can follow the following template
df.registerTempTable(tableName) hc.sql(s"create table $tableName as select * from $tableName") this will save the table in hive with given tableName. Regards, Madhukara Phatak http://datamantra.io/ On Thu, Apr 23, 2015 at 4:00 AM, Michael Armbrust <mich...@databricks.com> wrote: > Sorry for the confusion. We should be more clear about the semantics in > the documentation. (PRs welcome :) ) > > .saveAsTable does not create a hive table, but instead creates a Spark > Data Source table. Here the metadata is persisted into Hive, but hive > cannot read the tables (as this API support MLlib vectors, schema > discovery, and other things that hive does not). If you want to create a > hive table, use HiveQL and run a CREATE TABLE AS SELECT ... > > On Wed, Apr 22, 2015 at 12:50 AM, Ophir Cohen <oph...@gmail.com> wrote: > >> I wrote few mails here regarding this issue. >> After further investigation I think there is a bug in Spark 1.3 in saving >> Hive tables. >> >> (hc is HiveContext) >> >> 1. Verify the needed configuration exists: >> scala> hc.sql("set hive.exec.compress.output").collect >> res4: Array[org.apache.spark.sql.Row] = >> Array([hive.exec.compress.output=true]) >> scala> hc.sql("set >> mapreduce.output.fileoutputformat.compress.codec").collect >> res5: Array[org.apache.spark.sql.Row] = >> Array([mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec]) >> scala> hc.sql("set >> mapreduce.output.fileoutputformat.compress.type").collect >> res6: Array[org.apache.spark.sql.Row] = >> Array([mapreduce.output.fileoutputformat.compress.type=BLOCK]) >> 2. Loading DataFrame and save as table (path point to exists file): >> val saDF = hc.parquetFile(path) >> saDF.count >> >> (count yield 229764 - i.e. the rdd exists) >> saDF.saveAsTable("test_hive_ms") >> >> Now for few interesting outputs: >> 1. Trying to query Hive CLI, the table exists but with wrong output >> format: >> Failed with exception java.io.IOException:java.io.IOException: hdfs:// >> 10.166.157.97:9000/user/hive/warehouse/test_hive_ms/part-r-00001.parquet >> not a SequenceFile >> 2. Looking at the output files found that files are '.parquet' and not >> '.snappy' >> 3. Looking at the saveAsTable output shows that it actually store the >> table in both, wrong output format and without compression: >> 15/04/22 07:16:54 INFO metastore.HiveMetaStore: 0: create_table: >> Table(tableName:test_hive_ms, dbName:default, owner:hadoop, >> createTime:1429687014, lastAccessTime:0, retention:0, >> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, >> comment:from deserializer)], location:null, >> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, >> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, >> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, >> serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, >> parameters:{serialization.format=1, path=hdfs:// >> 10.166.157.97:9000/user/hive/warehouse/test_hive_ms} >> <http://10.166.157.97:9000/user/hive/warehouse/test_hive_ms%7D>), >> bucketCols:[], sortCols:[], parameters:{}, >> skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], >> skewedColValueLocationMaps:{})), partitionKeys:[], >> parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"ADJDATE","type":"long","nullable":true,"metadata":{}},{"name":"sid","type":"integer","nullable":true,"metadata":{}},{"name":"ADJTYPE","type":"integer","nullable":true,"metadata":{}},{"name":"ENDADJDATE","type":"long","nullable":true,"metadata":{}},{"name":"ADJFACTOR","type":"double","nullable":true,"metadata":{}},{"name":"CUMADJFACTOR","type":"double","nullable":true,"metadata":{}}]}, >> EXTERNAL=FALSE, spark.sql.sources.schema.numParts=1, >> spark.sql.sources.provider=org.apache.spark.sql.parquet}, >> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) >> >> So, the question is: do I miss some configuration here or should I open a >> bug? >> >> Thanks, >> Ophir >> >> >