spark-sql not coming up with Hive 0.10.0/CDH 4.6
) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:272) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:98) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:58) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 14/10/15 17:45:25 ERROR RetryingRawStore: JDO datastore error. Retrying metastore command after 1000 ms (attempt 1 of 1) 14/10/15 17:45:26 WARN Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses resulted in no possible candidates Required table missing : `VERSION` in Catalog Schema . DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable datanucleus.autoCreateTables org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : `VERSION` in can somebody tell what am I missing ? Same works via hive shell. Thanks, Anurag Tangri
Re: spark-sql not coming up with Hive 0.10.0/CDH 4.6
I see Hive 0.10.0 metastore sql does not have a VERSION table but spark is looking for it. Anyone else faced this issue or any ideas on how to fix it ? Thanks, Anurag Tangri On Wed, Oct 15, 2014 at 10:51 AM, Anurag Tangri atan...@groupon.com wrote: Hi, I compiled spark 1.1.0 with CDH 4.6 but when I try to get spark-sql cli up, it gives error: == [atangri@pit-uat-hdputil1 bin]$ ./spark-sql Spark assembly has been built with Hive, including Datanucleus jars on classpath Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Unable to initialize logging using hive-log4j.properties, not found on CLASSPATH! Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/10/15 17:45:17 INFO SecurityManager: Changing view acls to: atangri, 14/10/15 17:45:17 INFO SecurityManager: Changing modify acls to: atangri, 14/10/15 17:45:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(atangri, ); users with modify permissions: Set(atangri, ) 14/10/15 17:45:17 INFO Slf4jLogger: Slf4jLogger started 14/10/15 17:45:17 INFO Remoting: Starting remoting 14/10/15 17:45:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@pit-uat-hdputil1.snc1:54506] 14/10/15 17:45:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@pit-uat-hdputil1.snc1:54506] 14/10/15 17:45:17 INFO Utils: Successfully started service 'sparkDriver' on port 54506. 14/10/15 17:45:17 INFO SparkEnv: Registering MapOutputTracker 14/10/15 17:45:17 INFO SparkEnv: Registering BlockManagerMaster 14/10/15 17:45:17 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141015174517-bdfa 14/10/15 17:45:17 INFO Utils: Successfully started service 'Connection manager for block manager' on port 58400. 14/10/15 17:45:17 INFO ConnectionManager: Bound socket to port 58400 with id = ConnectionManagerId(pit-uat-hdputil1.snc1,58400) 14/10/15 17:45:17 INFO MemoryStore: MemoryStore started with capacity 265.1 MB 14/10/15 17:45:17 INFO BlockManagerMaster: Trying to register BlockManager 14/10/15 17:45:17 INFO BlockManagerMasterActor: Registering block manager pit-uat-hdputil1.snc1:58400 with 265.1 MB RAM 14/10/15 17:45:17 INFO BlockManagerMaster: Registered BlockManager 14/10/15 17:45:17 INFO HttpFileServer: HTTP File server directory is /tmp/spark-c7f28004-6189-424f-a214-379d5dcc72b7 14/10/15 17:45:17 INFO HttpServer: Starting HTTP Server 14/10/15 17:45:17 INFO Utils: Successfully started service 'HTTP file server' on port 33666. 14/10/15 17:45:18 INFO Utils: Successfully started service 'SparkUI' on port 4040. 14/10/15 17:45:18 INFO SparkUI: Started SparkUI at http://pit-uat-hdputil1.snc1:4040 14/10/15 17:45:18 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@pit-uat-hdputil1.snc1:54506/user/HeartbeatReceiver spark-sql show tables; 14/10/15 17:45:22 INFO ParseDriver: Parsing command: show tables 14/10/15 17:45:22 INFO ParseDriver: Parse Completed 14/10/15 17:45:23 INFO Driver: PERFLOG method=Driver.run 14/10/15 17:45:23 INFO Driver: PERFLOG method=TimeToSubmit 14/10/15 17:45:23 INFO Driver: PERFLOG method=compile 14/10/15 17:45:23 INFO Driver: PERFLOG method=parse 14/10/15 17:45:23 INFO ParseDriver: Parsing command: show tables 14/10/15 17:45:23 INFO ParseDriver: Parse Completed 14/10/15 17:45:23 INFO Driver: /PERFLOG method=parse start=1413395123538 end=1413395123539 duration=1 14/10/15 17:45:23 INFO Driver: PERFLOG method=semanticAnalyze 14/10/15 17:45:23 INFO Driver: Semantic Analysis Completed 14/10/15 17:45:23 INFO Driver: /PERFLOG method=semanticAnalyze start=1413395123539 end=1413395123641 duration=102 14/10/15 17:45:23 INFO ListSinkOperator: Initializing Self 0 OP 14/10/15 17:45:23 INFO ListSinkOperator: Operator 0 OP initialized 14/10/15 17:45:23 INFO ListSinkOperator: Initialization Done 0 OP 14/10/15 17:45:23 INFO Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/10/15 17:45:23 INFO Driver: /PERFLOG method=compile start=1413395123517 end=1413395123696 duration=179 14/10/15 17:45:23 INFO Driver: PERFLOG method=Driver.execute 14/10/15 17:45:23 INFO Driver: Starting command: show tables 14/10/15 17:45:23 INFO Driver: /PERFLOG method=TimeToSubmit start=1413395123517 end=1413395123698 duration=181 14/10/15 17:45:23 INFO Driver: PERFLOG method=runTasks 14/10/15 17:45:23 INFO Driver: PERFLOG method=task.DDL.Stage-0 14/10/15 17:45:23 INFO HiveMetaStore: 0: Opening raw store with implemenation
Re: spark-sql not coming up with Hive 0.10.0/CDH 4.6
Hi Marcelo, Exactly. Found it few minutes ago. I ran mysql hive 12 sql on my hive 10 metastore, which created missing tables and it seems to be working now. Not sure if everything else in CDH 4.6/Hive 10 would also still be working though or not. Looks like we cannot use Spark SQL in a clean way with CDH4 unless we upgrade to CDH5. Thanks for your response! Thanks, Anurag Tangri On Wed, Oct 15, 2014 at 12:02 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Anurag, Spark SQL (from the Spark standard distribution / sources) currently requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not gonna work. CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can talk to the Hive 0.13.1 that is also bundled with CDH, so if that's an option for you, you could try it out. On Wed, Oct 15, 2014 at 11:23 AM, Anurag Tangri atan...@groupon.com wrote: I see Hive 0.10.0 metastore sql does not have a VERSION table but spark is looking for it. Anyone else faced this issue or any ideas on how to fix it ? Thanks, Anurag Tangri On Wed, Oct 15, 2014 at 10:51 AM, Anurag Tangri atan...@groupon.com wrote: Hi, I compiled spark 1.1.0 with CDH 4.6 but when I try to get spark-sql cli up, it gives error: == [atangri@pit-uat-hdputil1 bin]$ ./spark-sql Spark assembly has been built with Hive, including Datanucleus jars on classpath Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Unable to initialize logging using hive-log4j.properties, not found on CLASSPATH! Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/10/15 17:45:17 INFO SecurityManager: Changing view acls to: atangri, 14/10/15 17:45:17 INFO SecurityManager: Changing modify acls to: atangri, 14/10/15 17:45:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(atangri, ); users with modify permissions: Set(atangri, ) 14/10/15 17:45:17 INFO Slf4jLogger: Slf4jLogger started 14/10/15 17:45:17 INFO Remoting: Starting remoting 14/10/15 17:45:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@pit-uat-hdputil1.snc1:54506] 14/10/15 17:45:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@pit-uat-hdputil1.snc1:54506] 14/10/15 17:45:17 INFO Utils: Successfully started service 'sparkDriver' on port 54506. 14/10/15 17:45:17 INFO SparkEnv: Registering MapOutputTracker 14/10/15 17:45:17 INFO SparkEnv: Registering BlockManagerMaster 14/10/15 17:45:17 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141015174517-bdfa 14/10/15 17:45:17 INFO Utils: Successfully started service 'Connection manager for block manager' on port 58400. 14/10/15 17:45:17 INFO ConnectionManager: Bound socket to port 58400 with id = ConnectionManagerId(pit-uat-hdputil1.snc1,58400) 14/10/15 17:45:17 INFO MemoryStore: MemoryStore started with capacity 265.1 MB 14/10/15 17:45:17 INFO BlockManagerMaster: Trying to register BlockManager 14/10/15 17:45:17 INFO BlockManagerMasterActor: Registering block manager pit-uat-hdputil1.snc1:58400 with 265.1 MB RAM 14/10/15 17:45:17 INFO BlockManagerMaster: Registered BlockManager 14/10/15 17:45:17 INFO HttpFileServer: HTTP File server directory is /tmp/spark-c7f28004-6189-424f-a214-379d5dcc72b7 14/10/15 17:45:17 INFO HttpServer: Starting HTTP Server 14/10/15 17:45:17 INFO Utils: Successfully started service 'HTTP file server' on port 33666. 14/10/15 17:45:18 INFO Utils: Successfully started service 'SparkUI' on port 4040. 14/10/15 17:45:18 INFO SparkUI: Started SparkUI at http://pit-uat-hdputil1.snc1:4040 14/10/15 17:45:18 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@pit-uat-hdputil1.snc1 :54506/user/HeartbeatReceiver spark-sql show tables; 14/10/15 17:45:22 INFO ParseDriver: Parsing command: show tables 14/10/15 17:45:22 INFO ParseDriver: Parse Completed 14/10/15 17:45:23 INFO Driver: PERFLOG method=Driver.run 14/10/15 17:45:23 INFO Driver: PERFLOG method=TimeToSubmit 14/10/15 17:45:23 INFO Driver: PERFLOG method=compile 14/10/15 17:45:23 INFO Driver: PERFLOG method=parse 14/10/15 17:45:23 INFO ParseDriver: Parsing command: show tables 14/10/15 17:45:23 INFO ParseDriver: Parse Completed 14/10/15 17:45:23 INFO Driver: /PERFLOG method=parse start=1413395123538 end=1413395123539 duration=1 14/10/15 17:45:23 INFO Driver: PERFLOG method=semanticAnalyze 14/10/15 17:45:23 INFO Driver: Semantic Analysis Completed 14/10/15 17:45:23 INFO Driver: /PERFLOG method=semanticAnalyze
Hive 11 / CDH 4.6/ Spark 0.9.1 dilemmna
I posted this in cdh-user mailing list yesterday and think this should have been the right audience for this: = Hi All, Not sure if anyone else faced this same issue or not. We installed CDH 4.6 that uses Hive 0.10. And we have Spark 0.9.1 that comes with Hive 11. Now our hive jobs that work on CDH, fail in Shark. Anyone else facing same issues and any work-arounds ? Can we re-compile shark 0.9.1 with hive 10 or compile hive 11 on CDH 4.6 ? Thanks, Anurag Tangri
Re: 1.0.0 Release Date?
Hi All, We are also waiting for this. Does anyone know of tentative date for this release ? We are at spark 0.8.0 right now. Should we wait for spark 1.0 or upgrade to spark 0.9.1 ? Thanks, Anurag Tangri On Tue, May 13, 2014 at 9:40 AM, bhusted brian.hus...@gmail.com wrote: Can anyone comment on the anticipated date or worse case timeframe for when Spark 1.0.0 will be released? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-0-0-Release-Date-tp5664.html Sent from the Apache Spark User List mailing list archive at Nabble.com.