Upon reviewing your other thread, could you confirm that your Hive metastore that you can connect to via Hive is a MySQL database? And to also confirm, when you're running spark-shell and doing a "show tables" statement, you're getting the same error?
On Fri, Mar 27, 2015 at 6:08 AM ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > I tried the following > > 1) > > ./bin/spark-submit -v --master yarn-cluster --driver-class-path > /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar: > *$SPARK_HOME/conf/hive-site.xml* --jars > /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar > --num-executors 1 --driver-memory 4g --driver-java-options > "-XX:MaxPermSize=2G" --executor-memory 2g --executor-cores 1 --queue > hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp > spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16 endDate=2015-02-16 > input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro > subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2 > > > This throws dw_bid not found. Looks like Spark SQL is unable to read my > existing Hive metastore and creates its own and hence complains that table > is not found. > > > 2) > > ./bin/spark-submit -v --master yarn-cluster --driver-class-path > /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar > --jars > /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar: > *$SPARK_HOME/conf/hive-site.xml* --num-executors 1 --driver-memory 4g > --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g > --executor-cores 1 --queue hdmi-express --class > com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar > startDate=2015-02-16 endDate=2015-02-16 > input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro > subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2 > > This time i do not get above error, however i get MySQL driver not found > exception. Looks like this is even before its able to communicate to Hive. > > Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke > the "BONECP" plugin to create a ConnectionPool gave an error : The > specified datastore driver ("com.mysql.jdbc.Driver") was not found in the > CLASSPATH. Please check your CLASSPATH specification, and the name of the > driver. > > In both above cases, i do have hive-site.xml in Spark/conf folder. > > 3) > ./bin/spark-submit -v --master yarn-cluster --driver-class-path > /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar > --jars > /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar--num-executors > 1 --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G" > --executor-memory 2g --executor-cores 1 --queue hdmi-express --class > com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar > startDate=2015-02-16 endDate=2015-02-16 > input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro > subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2 > > I do not specify hive-site.xml in --jars or --driver-class-path. Its > present in spark/conf folder as per > https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables > . > > In this case i get same error as #1. dw_bid table not found. > > I want Spark SQL to know that there are tables in Hive and read that data. > As per guide it looks like Spark SQL has that support. > > Please suggest. > > Regards, > Deepak > > > On Thu, Mar 26, 2015 at 9:01 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> > wrote: > >> Stack Trace: >> >> 15/03/26 08:25:42 INFO ql.Driver: OK >> 15/03/26 08:25:42 INFO log.PerfLogger: <PERFLOG method=releaseLocks >> from=org.apache.hadoop.hive.ql.Driver> >> 15/03/26 08:25:42 INFO log.PerfLogger: </PERFLOG method=releaseLocks >> start=1427383542966 end=1427383542966 duration=0 >> from=org.apache.hadoop.hive.ql.Driver> >> 15/03/26 08:25:42 INFO log.PerfLogger: </PERFLOG method=Driver.run >> start=1427383535203 end=1427383542966 duration=7763 >> from=org.apache.hadoop.hive.ql.Driver> >> 15/03/26 08:25:42 INFO metastore.HiveMetaStore: 0: get_tables: db=default >> pat=.* >> 15/03/26 08:25:42 INFO HiveMetaStore.audit: ugi=dvasthimal >> ip=unknown-ip-addr cmd=get_tables: db=default pat=.* >> 15/03/26 08:25:43 INFO parse.ParseDriver: Parsing command: insert >> overwrite table sojsuccessevents2_spark select >> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId, >> shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as >> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId, >> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort, >> isDuplicate,b.bid_date as >> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid >> as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as >> >> bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId, >> sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 a >> join dw_bid b on a.itemId = b.item_id and a.transactionId = >> b.transaction_id where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >= >> '2015-02-16' AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( >> b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin') >> 15/03/26 08:25:43 INFO parse.ParseDriver: Parse Completed >> 15/03/26 08:25:43 INFO metastore.HiveMetaStore: 0: get_table : db=default >> tbl=sojsuccessevents2_spark >> 15/03/26 08:25:43 INFO HiveMetaStore.audit: ugi=dvasthimal >> ip=unknown-ip-addr cmd=get_table : db=default tbl=sojsuccessevents2_spark >> 15/03/26 08:25:44 INFO metastore.HiveMetaStore: 0: get_table : db=default >> tbl=dw_bid >> 15/03/26 08:25:44 INFO HiveMetaStore.audit: ugi=dvasthimal >> ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid >> 15/03/26 08:25:44 ERROR metadata.Hive: >> NoSuchObjectException(message:default.dw_bid table not found) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) >> at com.sun.proxy.$Proxy31.get_table(Unknown Source) >> at >> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) >> at com.sun.proxy.$Proxy32.getTable(Unknown Source) >> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:976) >> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) >> at >> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:180) >> at org.apache.spark.sql.hive.HiveContext$$anon$1.org >> $apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:252) >> at >> org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161) >> at >> org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161) >> at scala.Option.getOrElse(Option.scala:120) >> at >> org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161) >> at >> org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:252) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:175) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:187) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:182) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:187) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:187) >> at >> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:186) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:194) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:177) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:182) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:172) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) >> at >> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) >> at scala.collection.immutable.List.foldLeft(List.scala:84) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:1071) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:1071) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1069) >> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) >> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) >> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92) >> at >> com.ebay.ep.poc.spark.reporting.process.service.HadoopSuccessEvents2Service.execute(HadoopSuccessEvents2Service.scala:32) >> at com.ebay.ep.poc.spark.reporting.SparkApp$.main(SparkApp.scala:30) >> at com.ebay.ep.poc.spark.reporting.SparkApp.main(SparkApp.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480) >> >> 15/03/26 08:25:44 ERROR yarn.ApplicationMaster: User class threw >> exception: no such table List(dw_bid); line 1 pos 843 >> org.apache.spark.sql.AnalysisException: no such table List(dw_bid); line >> 1 pos 843 >> at >> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:178) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:187) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:182) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:187) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:187) >> at >> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:186) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:192) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:207) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:236) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:194) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:177) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:182) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:172) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) >> at >> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) >> at scala.collection.immutable.List.foldLeft(List.scala:84) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:1071) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:1071) >> at >> org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1069) >> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) >> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) >> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:92) >> at >> com.ebay.ep.poc.spark.reporting.process.service.HadoopSuccessEvents2Service.execute(HadoopSuccessEvents2Service.scala:32) >> at com.ebay.ep.poc.spark.reporting.SparkApp$.main(SparkApp.scala:30) >> at com.ebay.ep.poc.spark.reporting.SparkApp.main(SparkApp.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480) >> 15/03/26 08:25:44 INFO yarn.ApplicationMaster: Final app status: FAILED, >> exitCode: 15, (reason: User class threw exception: no such table >> List(dw_bid); line 1 pos 843) >> 15/03/26 08:25:44 INFO yarn.ApplicationMaster: Invoking sc stop from >> shutdown hook >> >> >> On Thu, Mar 26, 2015 at 8:58 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >> wrote: >> >>> Hello Michael, >>> Thanks for your time. >>> >>> 1. show tables from Spark program returns nothing. >>> 2. What entities are you talking about ? (I am actually new to Hive as >>> well) >>> >>> >>> On Thu, Mar 26, 2015 at 8:35 PM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >>>> What does "show tables" return? You can also run "SET <optionName>" to >>>> make sure that entries from you hive site are being read correctly. >>>> >>>> On Thu, Mar 26, 2015 at 4:02 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>> wrote: >>>> >>>>> I have tables dw_bid that is created in Hive and has nothing to do >>>>> with Spark. I have data in avro that i want to join with dw_bid table, >>>>> this join needs to be done using Spark SQL. However for some reason Spark >>>>> says dw_bid table does not exist. How do i say spark that dw_bid is a >>>>> table >>>>> created in Hive and read it. >>>>> >>>>> >>>>> Query that is run from Spark SQL >>>>> ============================== >>>>> insert overwrite table sojsuccessevents2_spark select >>>>> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId, >>>>> shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as >>>>> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId, >>>>> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort, >>>>> isDuplicate,b.bid_date as >>>>> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid >>>>> as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as >>>>> >>>>> bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId, >>>>> sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 a >>>>> join dw_bid b on a.itemId = b.item_id and a.transactionId = >>>>> b.transaction_id where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >= >>>>> '2015-02-16' AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( >>>>> b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin') >>>>> >>>>> >>>>> If i create sojsuccessevents2_spark from hive command line and run >>>>> above command form Spark SQL program then i get error >>>>> "sojsuccessevents2_spark table not found". >>>>> >>>>> Hence i dropped the command from Hive and run create table >>>>> sojsuccessevents2_spark from Spark SQL before running above command and it >>>>> works until it hits next road block "dw_bid table not found" >>>>> >>>>> This makes me belive that Spark for some reason is not able to >>>>> read/understand the tables created outside Spark. I did copy >>>>> /apache/hive/conf/hive-site.xml into Spark conf directory. >>>>> >>>>> Please suggest. >>>>> >>>>> >>>>> Logs >>>>> ——— >>>>> 15/03/26 03:50:40 INFO HiveMetaStore.audit: ugi=dvasthimal >>>>> ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid >>>>> 15/03/26 03:50:40 ERROR metadata.Hive: >>>>> NoSuchObjectException(message:default.dw_bid table not found) >>>>> at >>>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) >>>>> >>>>> >>>>> >>>>> 15/03/26 03:50:40 ERROR yarn.ApplicationMaster: User class threw >>>>> exception: no such table List(dw_bid); line 1 pos 843 >>>>> org.apache.spark.sql.AnalysisException: no such table List(dw_bid); >>>>> line 1 pos 843 >>>>> at >>>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >>>>> at >>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:178) >>>>> at >>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:187) >>>>> >>>>> >>>>> >>>>> Regards, >>>>> Deepak >>>>> >>>>> >>>>> On Thu, Mar 26, 2015 at 4:27 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> I have this query >>>>>> >>>>>> insert overwrite table sojsuccessevents2_spark select >>>>>> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId, >>>>>> shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as >>>>>> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId, >>>>>> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort, >>>>>> isDuplicate,b.bid_date as >>>>>> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid >>>>>> as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as >>>>>> >>>>>> bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId, >>>>>> sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 >>>>>> a *join >>>>>> dw_bid b* on a.itemId = b.item_id and a.transactionId = >>>>>> b.transaction_id where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >= >>>>>> '2015-02-16' AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( >>>>>> b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin') >>>>>> >>>>>> >>>>>> If i create sojsuccessevents2_spark from hive command line and run >>>>>> above command form Spark SQL program then i get error >>>>>> "sojsuccessevents2_spark table not found". >>>>>> >>>>>> Hence i dropped the command from Hive and run create table >>>>>> sojsuccessevents2_spark from Spark SQL before running above command and >>>>>> it >>>>>> works until it hits next road block "*dw_bid table not found"* >>>>>> >>>>>> This makes me belive that Spark for some reason is not able to >>>>>> read/understand the tables created outside Spark. I did copy >>>>>> /apache/hive/conf/hive-site.xml into Spark conf directory. >>>>>> >>>>>> Please suggest. >>>>>> >>>>>> Regards, >>>>>> Deepak >>>>>> >>>>>> >>>>>> On Thu, Mar 26, 2015 at 1:26 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I have a hive table named dw_bid, when i run hive from command >>>>>>> prompt and run describe dw_bid, it works. >>>>>>> >>>>>>> I want to join a avro file (table) in HDFS with this hive dw_bid >>>>>>> table and i refer it as dw_bid from Spark SQL program, however i see >>>>>>> >>>>>>> 15/03/26 00:31:01 INFO HiveMetaStore.audit: ugi=dvasthimal >>>>>>> ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid >>>>>>> 15/03/26 00:31:01 ERROR metadata.Hive: >>>>>>> NoSuchObjectException(message:default.dw_bid table not found) >>>>>>> at >>>>>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>> >>>>>>> >>>>>>> Code: >>>>>>> >>>>>>> val successDetail_S1 = sqlContext.avroFile(input) >>>>>>> successDetail_S1.registerTempTable("sojsuccessevents1") >>>>>>> val countS1 = sqlContext.sql("select >>>>>>> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId," >>>>>>> + >>>>>>> " shopCartId,b.transaction_Id as >>>>>>> transactionId,offerId,b.bdr_id as >>>>>>> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId," >>>>>>> + >>>>>>> " >>>>>>> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort," >>>>>>> + >>>>>>> " isDuplicate,b.bid_date as >>>>>>> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid >>>>>>> as bidQuantity," + >>>>>>> " b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as >>>>>>> >>>>>>> bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId," >>>>>>> + >>>>>>> " sellerStdLevel,cssSellerLevel,a.experimentChannel" + >>>>>>> " from sojsuccessevents1 a join dw_bid b " + >>>>>>> " on a.itemId = b.item_id and a.transactionId = >>>>>>> b.transaction_id " + >>>>>>> " where b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( >>>>>>> b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin')") >>>>>>> println("countS1.first:" + countS1.first) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Any suggestions on how to refer a hive table form Spark SQL? >>>>>>> -- >>>>>>> >>>>>>> Deepak >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Deepak >>>>> >>>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> >> >> >> -- >> Deepak >> >> > > > -- > Deepak > >