Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread alexandria1101
Thank you!! I can do this using saveAsTable with the schemaRDD, right? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13979.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Denny Lee
It sort of depends on the definition of efficiently.  From a work flow 
perspective I would agree but from an I/O perspective, wouldn’t there be the 
same multi-pass from the standpoint of the Hive context needing to push the 
data into HDFS?  Saying this, if you’re pushing the data into HDFS and then 
creating Hive tables via load (vs. a reference point ala external tables), I 
would agree with you.  

And thanks for correcting me, the registerTempTable is in the SqlContext.


On September 10, 2014 at 13:47:24, Du Li (l...@yahoo-inc.com) wrote:

Hi Denny,  

There is a related question by the way.  

I have a program that reads in a stream of RDD¹s, each of which is to be  
loaded into a hive table as one partition. Currently I do this by first  
writing the RDD¹s to HDFS and then loading them to hive, which requires  
multiple passes of HDFS I/O and serialization/deserialization.  

I wonder if it is possible to do it more efficiently with Spark 1.1  
streaming + SQL, e.g., by registering the RDDs into a hive context so that  
the data is loaded directly into the hive table in cache and meanwhile  
visible to jdbc/odbc clients. In the spark source code, the method  
registerTempTable you mentioned works on SqlContext instead of HiveContext.  

Thanks,  
Du  



On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote:  

Actually, when registering the table, it is only available within the sc  
context you are running it in. For Spark 1.1, the method name is changed  
to RegisterAsTempTable to better reflect that.  
  
The Thrift server process runs under a different process meaning that it  
cannot see any of the tables generated within the sc context. You would  
need to save the sc table into Hive and then the Thrift process would be  
able to see them.  
  
HTH!  
  
 On Sep 10, 2014, at 13:08, alexandria1101  
alexandria.shea...@gmail.com wrote:  
  
 I used the hiveContext to register the tables and the tables are still  
not  
 being found by the thrift server. Do I have to pass the hiveContext to  
JDBC  
 somehow?  
  
  
  
 --  
 View this message in context:  
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using  
-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html  
 Sent from the Apache Spark User List mailing list archive at Nabble.com.  
  
 -  
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org  
 For additional commands, e-mail: user-h...@spark.apache.org  
  
  
-  
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org  
For additional commands, e-mail: user-h...@spark.apache.org  
  



Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Du Li

SchemaRDD has a method insertInto(table). When the table is partitioned, it 
would be more sensible and convenient to extend it with a list of partition key 
and values.


From: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com
Date: Thursday, September 11, 2014 at 6:39 PM
To: Du Li l...@yahoo-inc.commailto:l...@yahoo-inc.com
Cc: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org 
u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org, 
alexandria1101 
alexandria.shea...@gmail.commailto:alexandria.shea...@gmail.com
Subject: Re: Table not found: using jdbc console to query sparksql hive 
thriftserver

It sort of depends on the definition of efficiently.  From a work flow 
perspective I would agree but from an I/O perspective, wouldn’t there be the 
same multi-pass from the standpoint of the Hive context needing to push the 
data into HDFS?  Saying this, if you’re pushing the data into HDFS and then 
creating Hive tables via load (vs. a reference point ala external tables), I 
would agree with you.

And thanks for correcting me, the registerTempTable is in the SqlContext.



On September 10, 2014 at 13:47:24, Du Li 
(l...@yahoo-inc.commailto:l...@yahoo-inc.com) wrote:

Hi Denny,

There is a related question by the way.

I have a program that reads in a stream of RDD¹s, each of which is to be
loaded into a hive table as one partition. Currently I do this by first
writing the RDD¹s to HDFS and then loading them to hive, which requires
multiple passes of HDFS I/O and serialization/deserialization.

I wonder if it is possible to do it more efficiently with Spark 1.1
streaming + SQL, e.g., by registering the RDDs into a hive context so that
the data is loaded directly into the hive table in cache and meanwhile
visible to jdbc/odbc clients. In the spark source code, the method
registerTempTable you mentioned works on SqlContext instead of HiveContext.

Thanks,
Du



On 9/10/14, 1:21 PM, Denny Lee 
denny.g@gmail.commailto:denny.g@gmail.com wrote:

Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed
to RegisterAsTempTable to better reflect that.

The Thrift server process runs under a different process meaning that it
cannot see any of the tables generated within the sc context. You would
need to save the sc table into Hive and then the Thrift process would be
able to see them.

HTH!

 On Sep 10, 2014, at 13:08, alexandria1101
alexandria.shea...@gmail.commailto:alexandria.shea...@gmail.com wrote:

 I used the hiveContext to register the tables and the tables are still
not
 being found by the thrift server. Do I have to pass the hiveContext to
JDBC
 somehow?



 --
 View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using
-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: 
 user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
 For additional commands, e-mail: 
 user-h...@spark.apache.orgmailto:user-h...@spark.apache.org


-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread alexandria1101
I used the hiveContext to register the tables and the tables are still not
being found by the thrift server.  Do I have to pass the hiveContext to JDBC
somehow?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread Denny Lee
Actually, when registering the table, it is only available within the sc 
context you are running it in. For Spark 1.1, the method name is changed to 
RegisterAsTempTable to better reflect that. 

The Thrift server process runs under a different process meaning that it cannot 
see any of the tables generated within the sc context. You would need to save 
the sc table into Hive and then the Thrift process would be able to see them.

HTH!

 On Sep 10, 2014, at 13:08, alexandria1101 alexandria.shea...@gmail.com 
 wrote:
 
 I used the hiveContext to register the tables and the tables are still not
 being found by the thrift server.  Do I have to pass the hiveContext to JDBC
 somehow?
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread Du Li
Hi Denny,

There is a related question by the way.

I have a program that reads in a stream of RDD¹s, each of which is to be
loaded into a hive table as one partition. Currently I do this by first
writing the RDD¹s to HDFS and then loading them to hive, which requires
multiple passes of HDFS I/O and serialization/deserialization.

I wonder if it is possible to do it more efficiently with Spark 1.1
streaming + SQL, e.g., by registering the RDDs into a hive context so that
the data is loaded directly into the hive table in cache and meanwhile
visible to jdbc/odbc clients. In the spark source code, the method
registerTempTable you mentioned works on SqlContext instead of HiveContext.

Thanks,
Du



On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote:

Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed
to RegisterAsTempTable to better reflect that.

The Thrift server process runs under a different process meaning that it
cannot see any of the tables generated within the sc context. You would
need to save the sc table into Hive and then the Thrift process would be
able to see them.

HTH!

 On Sep 10, 2014, at 13:08, alexandria1101
alexandria.shea...@gmail.com wrote:
 
 I used the hiveContext to register the tables and the tables are still
not
 being found by the thrift server.  Do I have to pass the hiveContext to
JDBC
 somehow?
 
 
 
 --
 View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using
-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-09 Thread alexandria1101
)
   
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81)
   
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:413)
   
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:468)
   
com.illumina.phoenix.genomedb.jdbc.MutationDAOJdbc.getMutationEntriesBetween(MutationDAOJdbc.java:143)
   
com.illumina.phoenix.etl.ClassificationService.assignMutationClassIndel(ClassificationService.java:342)
   
com.illumina.phoenix.etl.ClassificationService.call(ClassificationService.java:663)
com.illumina.phoenix.etl.Classifier.call(Classifier.java:72)
com.illumina.phoenix.etl.Classifier.call(Classifier.java:19)
   
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:923)
   
org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
   
org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236)
   
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
   
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-09 Thread Du Li
)
   
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tr
anslate(AbstractFallbackSQLExceptionTranslator.java:81)
   
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tr
anslate(AbstractFallbackSQLExceptionTranslator.java:81)
   
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:413)
   
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:468)
   
com.illumina.phoenix.genomedb.jdbc.MutationDAOJdbc.getMutationEntriesBetwe
en(MutationDAOJdbc.java:143)
   
com.illumina.phoenix.etl.ClassificationService.assignMutationClassIndel(Cl
assificationService.java:342)
   
com.illumina.phoenix.etl.ClassificationService.call(ClassificationService.
java:663)
com.illumina.phoenix.etl.Classifier.call(Classifier.java:72)
com.illumina.phoenix.etl.Classifier.call(Classifier.java:19)
   
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(Jav
aPairRDD.scala:923)
   
org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValues
RDD.scala:31)
   
org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValues
RDD.scala:31)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236)
   
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
   
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1145)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
:615)
java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
   at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSche
duler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
   at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch
eduler.scala:1174)
   at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch
eduler.scala:1173)
   at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala
:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173
)
   at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app
ly(DAGScheduler.scala:688)
   at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app
ly(DAGScheduler.scala:688)
   at scala.Option.foreach(Option.scala:236)
   at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.s
cala:688)
   at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$
2.applyOrElse(DAGScheduler.scala:1391)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractD
ispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java
:1339)
   at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja
va:107)







--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using-
jdbc-console-to-query-sparksql-hive-thriftserver-tp13840.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h