RE: Is the Thrift server right for me?

2015-02-11 Thread Judy Nash
It should relay the queries to spark (i.e. you shouldn't see any MR job on 
Hadoop  you should see activities on the spark app on headnode UI).

Check your hive-site.xml. Are you directing to the hive server 2 port instead 
of spark thrift port?
Their default ports are both 1.

From: Andrew Lee [mailto:alee...@hotmail.com]
Sent: Wednesday, February 11, 2015 12:00 PM
To: sjbrunst; user@spark.apache.org
Subject: RE: Is the Thrift server right for me?

I have ThriftServer2 up and running, however, I notice that it relays the query 
to HiveServer2 when I pass the hive-site.xml to it.

I'm not sure if this is the expected behavior, but based on what I have up and 
running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez 
query. In this case, I could just connect directly to HiveServer2 if Hive is 
all you need.

If you are programmer and want to mash up data from Hive with other tables and 
data in Spark, then Spark ThriftServer2 seems to be a good integration point at 
some use case.

Please correct me if I misunderstood the purpose of Spark ThriftServer2.

 Date: Thu, 8 Jan 2015 14:49:00 -0700
 From: sjbru...@uwaterloo.camailto:sjbru...@uwaterloo.ca
 To: user@spark.apache.orgmailto:user@spark.apache.org
 Subject: Is the Thrift server right for me?

 I'm building a system that collects data using Spark Streaming, does some
 processing with it, then saves the data. I want the data to be queried by
 multiple applications, and it sounds like the Thrift JDBC/ODBC server might
 be the right tool to handle the queries. However, the documentation for the
 Thrift server
 http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
 seems to be written for Hive users who are moving to Spark. I never used
 Hive before I started using Spark, so it is not clear to me how best to use
 this.

 I've tried putting data into Hive, then serving it with the Thrift server.
 But I have not been able to update the data in Hive without first shutting
 down the server. This is a problem because new data is always being streamed
 in, and so the data must continuously be updated.

 The system I'm building is supposed to replace a system that stores the data
 in MongoDB. The dataset has now grown so large that the database index does
 not fit in memory, which causes major performance problems in MongoDB.

 If the Thrift server is the right tool for me, how can I set it up for my
 application? If it is not the right tool, what else can I use?



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: 
 user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
 For additional commands, e-mail: 
 user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



RE: Is the Thrift server right for me?

2015-02-11 Thread Andrew Lee
I have ThriftServer2 up and running, however, I notice that it relays the query 
to HiveServer2 when I pass the hive-site.xml to it.
I'm not sure if this is the expected behavior, but based on what I have up and 
running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez 
query. In this case, I could just connect directly to HiveServer2 if Hive is 
all you need.
If you are programmer and want to mash up data from Hive with other tables and 
data in Spark, then Spark ThriftServer2 seems to be a good integration point at 
some use case.
Please correct me if I misunderstood the purpose of Spark ThriftServer2.

 Date: Thu, 8 Jan 2015 14:49:00 -0700
 From: sjbru...@uwaterloo.ca
 To: user@spark.apache.org
 Subject: Is the Thrift server right for me?
 
 I'm building a system that collects data using Spark Streaming, does some
 processing with it, then saves the data. I want the data to be queried by
 multiple applications, and it sounds like the Thrift JDBC/ODBC server might
 be the right tool to handle the queries. However,  the documentation for the
 Thrift server
 http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
   
 seems to be written for Hive users who are moving to Spark. I never used
 Hive before I started using Spark, so it is not clear to me how best to use
 this.
 
 I've tried putting data into Hive, then serving it with the Thrift server.
 But I have not been able to update the data in Hive without first shutting
 down the server. This is a problem because new data is always being streamed
 in, and so the data must continuously be updated.
 
 The system I'm building is supposed to replace a system that stores the data
 in MongoDB. The dataset has now grown so large that the database index does
 not fit in memory, which causes major performance problems in MongoDB.
 
 If the Thrift server is the right tool for me, how can I set it up for my
 application? If it is not the right tool, what else can I use?
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
  

Is the Thrift server right for me?

2015-01-08 Thread sjbrunst
I'm building a system that collects data using Spark Streaming, does some
processing with it, then saves the data. I want the data to be queried by
multiple applications, and it sounds like the Thrift JDBC/ODBC server might
be the right tool to handle the queries. However,  the documentation for the
Thrift server
http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
  
seems to be written for Hive users who are moving to Spark. I never used
Hive before I started using Spark, so it is not clear to me how best to use
this.

I've tried putting data into Hive, then serving it with the Thrift server.
But I have not been able to update the data in Hive without first shutting
down the server. This is a problem because new data is always being streamed
in, and so the data must continuously be updated.

The system I'm building is supposed to replace a system that stores the data
in MongoDB. The dataset has now grown so large that the database index does
not fit in memory, which causes major performance problems in MongoDB.

If the Thrift server is the right tool for me, how can I set it up for my
application? If it is not the right tool, what else can I use?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org