Re: Save DataFrame to Hive Table

Silvio Fiorito Tue, 01 Mar 2016 09:29:27 -0800

Just do:

val df = sqlContext.read.load(“/path/to/parquets/*”)


If you do df.explain it’ll show the multiple input paths.

From: "andres.fernan...@wellsfargo.com<mailto:andres.fernan...@wellsfargo.com>" 
<andres.fernan...@wellsfargo.com<mailto:andres.fernan...@wellsfargo.com>>
Date: Tuesday, March 1, 2016 at 12:00 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: RE: Save DataFrame to Hive Table

Good day colleagues. Quick question on Parquet and Dataframes. Right now I have 
the 4 parquet files stored in HDFS under the same path:
/path/to/parquets/parquet1, /path/to/parquets/parquet2, 
/path/to/parquets/parquet3, /path/to/parquets/parquet4…
I want to perform a union on all this parquet files. Is there any other way of 
doing this different to DataFrame’s unionAll?

Thank you very much in advance.

Andres Fernandez

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Tuesday, March 01, 2016 1:50 PM
To: Jeff Zhang
Cc: Yogesh Vyas; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Save DataFrame to Hive Table

Hi

It seems that your code is not specifying which database is your table created

Try this

scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> // Choose a database
scala> HiveContext.sql("show databases").show

scala> HiveContext.sql("use test")  // I chose test database
scala> HiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value 
STRING)")
scala> HiveContext.sql("desc TableName").show
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|     key|      int|   null|
|   value|   string|   null|
+--------+---------+-------+

// create a simple DF

Seq((1, "Mich"), (2, "James"))
val b = a.toDF

//Let me keep it simple. Create a temporary table and do a simple 
insert/select. No need to convolute it

b.registerTempTable("tmp")

// Rember this temporaryTable is created in sql context NOT HiveContext/ So 
HiveContext will NOT see that table
//
HiveContext.sql("INSERT INTO TableName SELECT * FROM tmp")
org.apache.spark.sql.AnalysisException: no such table tmp; line 1 pos 36

// This will work

sql("INSERT INTO TableName SELECT * FROM tmp")

sql("select count(1) from TableName").show
+---+
|_c0|
+---+
|  2|
+---+

HTH



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 1 March 2016 at 06:33, Jeff Zhang 
<zjf...@gmail.com<mailto:zjf...@gmail.com>> wrote:

The following line does not execute the sql so the table is not created.  Add 
.show() at the end to execute the sql.

hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key INT, value STRING)")

On Tue, Mar 1, 2016 at 2:22 PM, Yogesh Vyas 
<informy...@gmail.com<mailto:informy...@gmail.com>> wrote:
Hi,

I have created a DataFrame in Spark, now I want to save it directly
into the hive table. How to do it.?

I have created the hive table using following hiveContext:

HiveContext hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc.sc<http://sc.sc>());
        hiveContext.sql("CREATE TABLE IF NOT EXISTS TableName (key
INT, value STRING)");

I am using the following to save it into hive:
DataFrame.write().mode(SaveMode.Append).insertInto("TableName");

But it gives the error:
Exception in thread "main" java.lang.RuntimeException: Table Not
Found: TableName
        at scala.sys.package$.error(package.scala:27)
        at 
org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:266)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:264)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:264)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:254)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
        at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:916)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:916)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:918)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:917)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:921)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:921)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:926)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:924)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:930)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:930)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
        at 
org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:176)
        at 
org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:164)
        at com.honeywell.Track.combine.App.main(App.java:451)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>



--
Best Regards

Jeff Zhang

Re: Save DataFrame to Hive Table

Reply via email to