Re: YARN deployment of Spark and Thrift JDBC server

2014-10-16 Thread Cheng Lian


On 10/16/14 12:44 PM, neeraj wrote:

I would like to reiterate that I don't have Hive installed on the Hadoop
cluster.
I have some queries on following comment from Cheng Lian-2:
The Thrift server is used to interact with existing Hive data, and thus
needs Hive Metastore to access Hive catalog. In your case, you need to build
Spark with sbt/sbt -Phive,hadoop-2.4 clean package. But since you’ve already
started Thrift server successfully, this step should already have been done
properly.

1. Even though, I don't have Hive installed, How can I connect my
application (Microsoft Excel etc.) to Spark SQL. Do I must have Hive
installed.
Are you trying to use Excel as a data source of Spark SQL, or using 
Spark SQL as a data source of Excel? You can use Spark SQL in your own 
Spark applications without involving Hive, but the Thrift server is 
designed to interact to existing Hive data. Actually it's just a 
HiveServer2 port for Spark SQL.

2. Where can I download/get Spark SQL JDBC/ODBC drivers as I could not find
it on databricks site.
3. Could somebody point me to steps to connect Excel with Spark SQL and get
some data SQL. Is this possible at all.
I think this article from Denny Lee can be helpful, although it's about 
Tableau rather than Excel: 
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql

4. Which all applications can be used to connect Spark SQL.

In theory, all applications that support ODBC/JDBC can connect to Spark SQL.


Regards,
Neeraj








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: YARN deployment of Spark and Thrift JDBC server

2014-10-16 Thread neeraj
1. I'm trying to use Spark SQL as data source.. is it possible?
2. Please share the link of ODBC/ JDBC drivers at databricks.. i'm not able
to find the same.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16571.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: YARN deployment of Spark and Thrift JDBC server

2014-10-16 Thread Cheng Lian


On 10/16/14 10:48 PM, neeraj wrote:

1. I'm trying to use Spark SQL as data source.. is it possible?
Unfortunately Spark SQL ODBC/JDBC support are based on the Thrift 
server, so at least you need HDFS and a working Hive Metastore instance 
(used to persist catalogs) to make things work.

2. Please share the link of ODBC/ JDBC drivers at databricks.. i'm not able
to find the same.
Sorry, forgot to mention that Denny's article mentioned the ODBC driver 
link: http://www.datastax.com/download#dl-datastax-drivers


For JDBC access, you can just use Hive 0.12.0 JDBC driver, the Thrift 
server is compatible with it.


P.S. The ODBC driver is not from Databricks, but provided by 3rd party 
companies like DataStax and Simba.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16571.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: YARN deployment of Spark and Thrift JDBC server

2014-10-15 Thread neeraj
I would like to reiterate that I don't have Hive installed on the Hadoop
cluster. 
I have some queries on following comment from Cheng Lian-2:
The Thrift server is used to interact with existing Hive data, and thus
needs Hive Metastore to access Hive catalog. In your case, you need to build
Spark with sbt/sbt -Phive,hadoop-2.4 clean package. But since you’ve already
started Thrift server successfully, this step should already have been done
properly.

1. Even though, I don't have Hive installed, How can I connect my
application (Microsoft Excel etc.) to Spark SQL. Do I must have Hive
installed. 
2. Where can I download/get Spark SQL JDBC/ODBC drivers as I could not find
it on databricks site.
3. Could somebody point me to steps to connect Excel with Spark SQL and get
some data SQL. Is this possible at all.
4. Which all applications can be used to connect Spark SQL.

Regards,
Neeraj








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/YARN-deployment-of-Spark-and-Thrift-JDBC-server-tp16374p16537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: YARN deployment of Spark and Thrift JDBC server

2014-10-14 Thread Cheng Lian

On 10/14/14 7:31 PM, Neeraj Garg02 wrote:


Hi All,

I’ve downloaded and installed Apache Spark 1.1.0 pre-built for Hadoop 
2.4.


Now, I want to test two features of Spark:

|1.|*YARN deployment* : As per my understanding, I need to modify 
“spark-defaults.conf” file with the settings mentioned at URL 
http://spark.apache.org/docs/1.1.0/running-on-yarn.html#configuration 
. For example, settings like |spark.yarn.applicationMaster.waitTries 
etc.|||


|**|

|*In order to launch *|a Spark application in yarn-cluster mode, 
following command can be used once the configurations are done.


|*./bin/spark-submit --class path.to.your.Class --master yarn-cluster 
[options] app jar [app options]*||**|


|**|

|*Is this understanding correct*||*or please suggest with the steps to 
Deploy Spark on YARN.*||**|



Yes.


|**|

2.*Testing Thrift JDBC server connection: *I’ve Hadoop 2.4 cluster 
setup. Apache spark is running on this cluster. Now, in order to test 
JDC thrift server, I’ve successfully followed the steps mentioned in 
the “*Other SQL Interfaces” *section of Spark SQL programming guide 
i.e. I can see beeline prompt and it’s connected to thrift server 
using the given command. Please help me to get answers of following 
queries:


a.Which kind of queries I can execute using this beeline prompt. Would 
these be Spark SQL queries or Hive queries?



You can only use HiveQL under beeline.

*b.**Configuration of Hive is done by placing your 
*|*hive-site.xml*|*file in *|*conf/*|*.***Right now, I don’t have Hive 
installed as part of the Hadoop 2.4 cluster. Do I need to install Hive 
to test the Thrift JDBC server OR to execute Spark SQL queries from 
the beeline prompt.**


i.In case Hive installation is a pre-requisite, then,  is there a need 
to re-build the Spark package. What are the steps for these. Is 
internet required for the re-build?


The Thrift server is used to interact with existing Hive data, and thus 
needs Hive Metastore to access Hive catalog. In your case, you need to 
build Spark with |sbt/sbt -Phive,hadoop-2.4 clean package|. But since 
you’ve already started Thrift server successfully, this step should 
already have been done properly.


*c.*What else would I need in case I need to connect BI tools with 
Spark SQL using Thrift JDBC/ ODBC server. Please share the steps or 
pointers to do the same.


You can follow this awesome article authored by Denny Lee: 
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql



**

As I could not find sufficient information on the same, please help.

Please let me know if more information/ explanation is required.

Thanks and Regards,

Neeraj Garg

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


​