RE: Is the Thrift server right for me?
It should relay the queries to spark (i.e. you shouldn't see any MR job on Hadoop you should see activities on the spark app on headnode UI). Check your hive-site.xml. Are you directing to the hive server 2 port instead of spark thrift port? Their default ports are both 1. From: Andrew Lee [mailto:alee...@hotmail.com] Sent: Wednesday, February 11, 2015 12:00 PM To: sjbrunst; user@spark.apache.org Subject: RE: Is the Thrift server right for me? I have ThriftServer2 up and running, however, I notice that it relays the query to HiveServer2 when I pass the hive-site.xml to it. I'm not sure if this is the expected behavior, but based on what I have up and running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez query. In this case, I could just connect directly to HiveServer2 if Hive is all you need. If you are programmer and want to mash up data from Hive with other tables and data in Spark, then Spark ThriftServer2 seems to be a good integration point at some use case. Please correct me if I misunderstood the purpose of Spark ThriftServer2. Date: Thu, 8 Jan 2015 14:49:00 -0700 From: sjbru...@uwaterloo.camailto:sjbru...@uwaterloo.ca To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Is the Thrift server right for me? I'm building a system that collects data using Spark Streaming, does some processing with it, then saves the data. I want the data to be queried by multiple applications, and it sounds like the Thrift JDBC/ODBC server might be the right tool to handle the queries. However, the documentation for the Thrift server http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server seems to be written for Hive users who are moving to Spark. I never used Hive before I started using Spark, so it is not clear to me how best to use this. I've tried putting data into Hive, then serving it with the Thrift server. But I have not been able to update the data in Hive without first shutting down the server. This is a problem because new data is always being streamed in, and so the data must continuously be updated. The system I'm building is supposed to replace a system that stores the data in MongoDB. The dataset has now grown so large that the database index does not fit in memory, which causes major performance problems in MongoDB. If the Thrift server is the right tool for me, how can I set it up for my application? If it is not the right tool, what else can I use? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
RE: Is the Thrift server right for me?
I have ThriftServer2 up and running, however, I notice that it relays the query to HiveServer2 when I pass the hive-site.xml to it. I'm not sure if this is the expected behavior, but based on what I have up and running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez query. In this case, I could just connect directly to HiveServer2 if Hive is all you need. If you are programmer and want to mash up data from Hive with other tables and data in Spark, then Spark ThriftServer2 seems to be a good integration point at some use case. Please correct me if I misunderstood the purpose of Spark ThriftServer2. Date: Thu, 8 Jan 2015 14:49:00 -0700 From: sjbru...@uwaterloo.ca To: user@spark.apache.org Subject: Is the Thrift server right for me? I'm building a system that collects data using Spark Streaming, does some processing with it, then saves the data. I want the data to be queried by multiple applications, and it sounds like the Thrift JDBC/ODBC server might be the right tool to handle the queries. However, the documentation for the Thrift server http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server seems to be written for Hive users who are moving to Spark. I never used Hive before I started using Spark, so it is not clear to me how best to use this. I've tried putting data into Hive, then serving it with the Thrift server. But I have not been able to update the data in Hive without first shutting down the server. This is a problem because new data is always being streamed in, and so the data must continuously be updated. The system I'm building is supposed to replace a system that stores the data in MongoDB. The dataset has now grown so large that the database index does not fit in memory, which causes major performance problems in MongoDB. If the Thrift server is the right tool for me, how can I set it up for my application? If it is not the right tool, what else can I use? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Is the Thrift server right for me?
I'm building a system that collects data using Spark Streaming, does some processing with it, then saves the data. I want the data to be queried by multiple applications, and it sounds like the Thrift JDBC/ODBC server might be the right tool to handle the queries. However, the documentation for the Thrift server http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server seems to be written for Hive users who are moving to Spark. I never used Hive before I started using Spark, so it is not clear to me how best to use this. I've tried putting data into Hive, then serving it with the Thrift server. But I have not been able to update the data in Hive without first shutting down the server. This is a problem because new data is always being streamed in, and so the data must continuously be updated. The system I'm building is supposed to replace a system that stores the data in MongoDB. The dataset has now grown so large that the database index does not fit in memory, which causes major performance problems in MongoDB. If the Thrift server is the right tool for me, how can I set it up for my application? If it is not the right tool, what else can I use? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org