Re: YARN deployment of Spark and Thrift JDBC server
On 10/16/14 12:44 PM, neeraj wrote: I would like to reiterate that I don't have Hive installed on the Hadoop cluster. I have some queries on following comment from Cheng Lian-2: The Thrift server is used to interact with existing Hive data, and thus needs Hive Metastore to access Hive catalog. In your case, you need to build Spark with sbt/sbt -Phive,hadoop-2.4 clean package. But since you’ve already started Thrift server successfully, this step should already have been done properly. 1. Even though, I don't have Hive installed, How can I connect my application (Microsoft Excel etc.) to Spark SQL. Do I must have Hive installed. Are you trying to use Excel as a data source of Spark SQL, or using Spark SQL as a data source of Excel? You can use Spark SQL in your own Spark applications without involving Hive, but the Thrift server is designed to interact to existing Hive data. Actually it's just a HiveServer2 port for Spark SQL. 2. Where can I download/get Spark SQL JDBC/ODBC drivers as I could not find it on databricks site. 3. Could somebody point me to steps to connect Excel with Spark SQL and get some data SQL. Is this possible at all. I think this article from Denny Lee can be helpful, although it's about Tableau rather than Excel: https://www.concur.com/blog/en-us/connect-tableau-to-sparksql 4. Which all applications can be used to connect Spark SQL. In theory, all applications that support ODBC/JDBC can connect to Spark SQL. Regards, Neeraj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16537.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: YARN deployment of Spark and Thrift JDBC server
1. I'm trying to use Spark SQL as data source.. is it possible? 2. Please share the link of ODBC/ JDBC drivers at databricks.. i'm not able to find the same. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: YARN deployment of Spark and Thrift JDBC server
On 10/16/14 10:48 PM, neeraj wrote: 1. I'm trying to use Spark SQL as data source.. is it possible? Unfortunately Spark SQL ODBC/JDBC support are based on the Thrift server, so at least you need HDFS and a working Hive Metastore instance (used to persist catalogs) to make things work. 2. Please share the link of ODBC/ JDBC drivers at databricks.. i'm not able to find the same. Sorry, forgot to mention that Denny's article mentioned the ODBC driver link: http://www.datastax.com/download#dl-datastax-drivers For JDBC access, you can just use Hive 0.12.0 JDBC driver, the Thrift server is compatible with it. P.S. The ODBC driver is not from Databricks, but provided by 3rd party companies like DataStax and Simba. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: YARN deployment of Spark and Thrift JDBC server
I would like to reiterate that I don't have Hive installed on the Hadoop cluster. I have some queries on following comment from Cheng Lian-2: The Thrift server is used to interact with existing Hive data, and thus needs Hive Metastore to access Hive catalog. In your case, you need to build Spark with sbt/sbt -Phive,hadoop-2.4 clean package. But since you’ve already started Thrift server successfully, this step should already have been done properly. 1. Even though, I don't have Hive installed, How can I connect my application (Microsoft Excel etc.) to Spark SQL. Do I must have Hive installed. 2. Where can I download/get Spark SQL JDBC/ODBC drivers as I could not find it on databricks site. 3. Could somebody point me to steps to connect Excel with Spark SQL and get some data SQL. Is this possible at all. 4. Which all applications can be used to connect Spark SQL. Regards, Neeraj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16537.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: YARN deployment of Spark and Thrift JDBC server
On 10/14/14 7:31 PM, Neeraj Garg02 wrote: Hi All, I’ve downloaded and installed Apache Spark 1.1.0 pre-built for Hadoop 2.4. Now, I want to test two features of Spark: |1.|*YARN deployment* : As per my understanding, I need to modify “spark-defaults.conf” file with the settings mentioned at URL http://spark.apache.org/docs/1.1.0/running-on-yarn.html#configuration . For example, settings like |spark.yarn.applicationMaster.waitTries etc.||| |**| |*In order to launch *|a Spark application in yarn-cluster mode, following command can be used once the configurations are done. |*./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] app jar [app options]*||**| |**| |*Is this understanding correct*||*or please suggest with the steps to Deploy Spark on YARN.*||**| Yes. |**| 2.*Testing Thrift JDBC server connection: *I’ve Hadoop 2.4 cluster setup. Apache spark is running on this cluster. Now, in order to test JDC thrift server, I’ve successfully followed the steps mentioned in the “*Other SQL Interfaces” *section of Spark SQL programming guide i.e. I can see beeline prompt and it’s connected to thrift server using the given command. Please help me to get answers of following queries: a.Which kind of queries I can execute using this beeline prompt. Would these be Spark SQL queries or Hive queries? You can only use HiveQL under beeline. *b.**Configuration of Hive is done by placing your *|*hive-site.xml*|*file in *|*conf/*|*.***Right now, I don’t have Hive installed as part of the Hadoop 2.4 cluster. Do I need to install Hive to test the Thrift JDBC server OR to execute Spark SQL queries from the beeline prompt.** i.In case Hive installation is a pre-requisite, then, is there a need to re-build the Spark package. What are the steps for these. Is internet required for the re-build? The Thrift server is used to interact with existing Hive data, and thus needs Hive Metastore to access Hive catalog. In your case, you need to build Spark with |sbt/sbt -Phive,hadoop-2.4 clean package|. But since you’ve already started Thrift server successfully, this step should already have been done properly. *c.*What else would I need in case I need to connect BI tools with Spark SQL using Thrift JDBC/ ODBC server. Please share the steps or pointers to do the same. You can follow this awesome article authored by Denny Lee: https://www.concur.com/blog/en-us/connect-tableau-to-sparksql ** As I could not find sufficient information on the same, please help. Please let me know if more information/ explanation is required. Thanks and Regards, Neeraj Garg CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***