spark-sql not coming up with Hive 0.10.0/CDH 4.6

2014-10-15 Thread Anurag Tangri
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:98)
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
14/10/15 17:45:25 ERROR RetryingRawStore: JDO datastore error. Retrying
metastore command after 1000 ms (attempt 1 of 1)
14/10/15 17:45:26 WARN Query: Query for candidates of
org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses
resulted in no possible candidates
Required table missing : `VERSION` in Catalog  Schema . DataNucleus
requires this table to perform its persistence operations. Either your
MetaData is incorrect, or you need to enable datanucleus.autoCreateTables Required
table missing : `VERSION` in

can somebody tell what am I missing ?

Same works via hive shell.

Anurag Tangri

Re: spark-sql not coming up with Hive 0.10.0/CDH 4.6

2014-10-15 Thread Anurag Tangri
I see Hive 0.10.0 metastore sql does not have a VERSION table but spark is
looking for it.

Anyone else faced this issue or any ideas on how to fix it ?

Anurag Tangri

On Wed, Oct 15, 2014 at 10:51 AM, Anurag Tangri wrote:

 I compiled spark 1.1.0 with CDH 4.6 but when I try to get spark-sql cli
 up, it gives error:


 [atangri@pit-uat-hdputil1 bin]$ ./spark-sql
 Spark assembly has been built with Hive, including Datanucleus jars on
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
 MaxPermSize=128m; support was removed in 8.0
 log4j:WARN No appenders could be found for logger
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See for
 more info.
 Unable to initialize logging using, not found on
 Using Spark's default log4j profile:
 14/10/15 17:45:17 INFO SecurityManager: Changing view acls to: atangri,
 14/10/15 17:45:17 INFO SecurityManager: Changing modify acls to: atangri,
 14/10/15 17:45:17 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(atangri, );
 users with modify permissions: Set(atangri, )
 14/10/15 17:45:17 INFO Slf4jLogger: Slf4jLogger started
 14/10/15 17:45:17 INFO Remoting: Starting remoting
 14/10/15 17:45:17 INFO Remoting: Remoting started; listening on addresses
 14/10/15 17:45:17 INFO Remoting: Remoting now listens on addresses:
 14/10/15 17:45:17 INFO Utils: Successfully started service 'sparkDriver'
 on port 54506.
 14/10/15 17:45:17 INFO SparkEnv: Registering MapOutputTracker
 14/10/15 17:45:17 INFO SparkEnv: Registering BlockManagerMaster
 14/10/15 17:45:17 INFO DiskBlockManager: Created local directory at
 14/10/15 17:45:17 INFO Utils: Successfully started service 'Connection
 manager for block manager' on port 58400.
 14/10/15 17:45:17 INFO ConnectionManager: Bound socket to port 58400 with
 id = ConnectionManagerId(pit-uat-hdputil1.snc1,58400)
 14/10/15 17:45:17 INFO MemoryStore: MemoryStore started with capacity
 265.1 MB
 14/10/15 17:45:17 INFO BlockManagerMaster: Trying to register BlockManager
 14/10/15 17:45:17 INFO BlockManagerMasterActor: Registering block manager
 pit-uat-hdputil1.snc1:58400 with 265.1 MB RAM
 14/10/15 17:45:17 INFO BlockManagerMaster: Registered BlockManager
 14/10/15 17:45:17 INFO HttpFileServer: HTTP File server directory is
 14/10/15 17:45:17 INFO HttpServer: Starting HTTP Server
 14/10/15 17:45:17 INFO Utils: Successfully started service 'HTTP file
 server' on port 33666.
 14/10/15 17:45:18 INFO Utils: Successfully started service 'SparkUI' on
 port 4040.
 14/10/15 17:45:18 INFO SparkUI: Started SparkUI at
 14/10/15 17:45:18 INFO AkkaUtils: Connecting to HeartbeatReceiver:
 spark-sql show tables;
 14/10/15 17:45:22 INFO ParseDriver: Parsing command: show tables
 14/10/15 17:45:22 INFO ParseDriver: Parse Completed
 14/10/15 17:45:23 INFO Driver: PERFLOG
 14/10/15 17:45:23 INFO Driver: PERFLOG method=TimeToSubmit
 14/10/15 17:45:23 INFO Driver: PERFLOG method=compile
 14/10/15 17:45:23 INFO Driver: PERFLOG method=parse
 14/10/15 17:45:23 INFO ParseDriver: Parsing command: show tables
 14/10/15 17:45:23 INFO ParseDriver: Parse Completed
 14/10/15 17:45:23 INFO Driver: /PERFLOG method=parse start=1413395123538
 end=1413395123539 duration=1
 14/10/15 17:45:23 INFO Driver: PERFLOG method=semanticAnalyze
 14/10/15 17:45:23 INFO Driver: Semantic Analysis Completed
 14/10/15 17:45:23 INFO Driver: /PERFLOG method=semanticAnalyze
 start=1413395123539 end=1413395123641 duration=102
 14/10/15 17:45:23 INFO ListSinkOperator: Initializing Self 0 OP
 14/10/15 17:45:23 INFO ListSinkOperator: Operator 0 OP initialized
 14/10/15 17:45:23 INFO ListSinkOperator: Initialization Done 0 OP
 14/10/15 17:45:23 INFO Driver: Returning Hive schema:
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from
 deserializer)], properties:null)
 14/10/15 17:45:23 INFO Driver: /PERFLOG method=compile
 start=1413395123517 end=1413395123696 duration=179
 14/10/15 17:45:23 INFO Driver: PERFLOG method=Driver.execute
 14/10/15 17:45:23 INFO Driver: Starting command: show tables
 14/10/15 17:45:23 INFO Driver: /PERFLOG method=TimeToSubmit
 start=1413395123517 end=1413395123698 duration=181
 14/10/15 17:45:23 INFO Driver: PERFLOG method=runTasks
 14/10/15 17:45:23 INFO Driver: PERFLOG method=task.DDL.Stage-0
 14/10/15 17:45:23 INFO HiveMetaStore: 0: Opening raw store with

Re: spark-sql not coming up with Hive 0.10.0/CDH 4.6

2014-10-15 Thread Anurag Tangri
Hi Marcelo,
Exactly. Found it few minutes ago.

I ran mysql hive 12 sql on my hive 10 metastore, which created missing
tables and it seems to be working now.

Not sure if everything else in CDH 4.6/Hive 10 would also still be working
though or not.

Looks like we cannot use Spark SQL in a clean way with CDH4 unless we
upgrade to CDH5.

Thanks for your response!

Anurag Tangri

On Wed, Oct 15, 2014 at 12:02 PM, Marcelo Vanzin

 Hi Anurag,

 Spark SQL (from the Spark standard distribution / sources) currently
 requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not
 gonna work.

 CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can
 talk to the Hive 0.13.1 that is also bundled with CDH, so if that's an
 option for you, you could try it out.

 On Wed, Oct 15, 2014 at 11:23 AM, Anurag Tangri
  I see Hive 0.10.0 metastore sql does not have a VERSION table but spark
  looking for it.
  Anyone else faced this issue or any ideas on how to fix it ?
  Anurag Tangri
  On Wed, Oct 15, 2014 at 10:51 AM, Anurag Tangri
  I compiled spark 1.1.0 with CDH 4.6 but when I try to get spark-sql cli
  up, it gives error:
  [atangri@pit-uat-hdputil1 bin]$ ./spark-sql
  Spark assembly has been built with Hive, including Datanucleus jars on
  Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
  MaxPermSize=128m; support was removed in 8.0
  log4j:WARN No appenders could be found for logger
  log4j:WARN Please initialize the log4j system properly.
  log4j:WARN See
  more info.
  Unable to initialize logging using, not found on
  Using Spark's default log4j profile:
  14/10/15 17:45:17 INFO SecurityManager: Changing view acls to: atangri,
  14/10/15 17:45:17 INFO SecurityManager: Changing modify acls to:
  14/10/15 17:45:17 INFO SecurityManager: SecurityManager: authentication
  disabled; ui acls disabled; users with view permissions: Set(atangri, );
  users with modify permissions: Set(atangri, )
  14/10/15 17:45:17 INFO Slf4jLogger: Slf4jLogger started
  14/10/15 17:45:17 INFO Remoting: Starting remoting
  14/10/15 17:45:17 INFO Remoting: Remoting started; listening on
  14/10/15 17:45:17 INFO Remoting: Remoting now listens on addresses:
  14/10/15 17:45:17 INFO Utils: Successfully started service 'sparkDriver'
  on port 54506.
  14/10/15 17:45:17 INFO SparkEnv: Registering MapOutputTracker
  14/10/15 17:45:17 INFO SparkEnv: Registering BlockManagerMaster
  14/10/15 17:45:17 INFO DiskBlockManager: Created local directory at
  14/10/15 17:45:17 INFO Utils: Successfully started service 'Connection
  manager for block manager' on port 58400.
  14/10/15 17:45:17 INFO ConnectionManager: Bound socket to port 58400
  id = ConnectionManagerId(pit-uat-hdputil1.snc1,58400)
  14/10/15 17:45:17 INFO MemoryStore: MemoryStore started with capacity
  265.1 MB
  14/10/15 17:45:17 INFO BlockManagerMaster: Trying to register
  14/10/15 17:45:17 INFO BlockManagerMasterActor: Registering block
  pit-uat-hdputil1.snc1:58400 with 265.1 MB RAM
  14/10/15 17:45:17 INFO BlockManagerMaster: Registered BlockManager
  14/10/15 17:45:17 INFO HttpFileServer: HTTP File server directory is
  14/10/15 17:45:17 INFO HttpServer: Starting HTTP Server
  14/10/15 17:45:17 INFO Utils: Successfully started service 'HTTP file
  server' on port 33666.
  14/10/15 17:45:18 INFO Utils: Successfully started service 'SparkUI' on
  port 4040.
  14/10/15 17:45:18 INFO SparkUI: Started SparkUI at
  14/10/15 17:45:18 INFO AkkaUtils: Connecting to HeartbeatReceiver:
  spark-sql show tables;
  14/10/15 17:45:22 INFO ParseDriver: Parsing command: show tables
  14/10/15 17:45:22 INFO ParseDriver: Parse Completed
  14/10/15 17:45:23 INFO Driver: PERFLOG
  14/10/15 17:45:23 INFO Driver: PERFLOG method=TimeToSubmit
  14/10/15 17:45:23 INFO Driver: PERFLOG method=compile
  14/10/15 17:45:23 INFO Driver: PERFLOG method=parse
  14/10/15 17:45:23 INFO ParseDriver: Parsing command: show tables
  14/10/15 17:45:23 INFO ParseDriver: Parse Completed
  14/10/15 17:45:23 INFO Driver: /PERFLOG method=parse
  end=1413395123539 duration=1
  14/10/15 17:45:23 INFO Driver: PERFLOG method=semanticAnalyze
  14/10/15 17:45:23 INFO Driver: Semantic Analysis Completed
  14/10/15 17:45:23 INFO Driver: /PERFLOG method=semanticAnalyze

Hive 11 / CDH 4.6/ Spark 0.9.1 dilemmna

2014-08-06 Thread Anurag Tangri
I posted this in cdh-user mailing list yesterday and think this should have
been the right audience for this:


Hi All,
Not sure if anyone else faced this same issue or not.

We installed CDH 4.6 that uses Hive 0.10.

And we have Spark 0.9.1 that comes with Hive 11.

Now our hive jobs that work on CDH, fail in Shark.

Anyone else facing same issues and any work-arounds ?

Can we re-compile shark 0.9.1 with hive 10 or compile hive 11 on CDH 4.6 ?

Anurag Tangri

Re: 1.0.0 Release Date?

2014-05-13 Thread Anurag Tangri
Hi All,
We are also waiting for this. Does anyone know of tentative date for this
release ?

We are at spark 0.8.0 right now.  Should we wait for spark 1.0 or upgrade
to spark 0.9.1 ?

Anurag Tangri

On Tue, May 13, 2014 at 9:40 AM, bhusted wrote:

 Can anyone comment on the anticipated date or worse case timeframe for when
 Spark 1.0.0 will be released?

 View this message in context:
 Sent from the Apache Spark User List mailing list archive at