I'm building a system that collects data using Spark Streaming, does some processing with it, then saves the data. I want the data to be queried by multiple applications, and it sounds like the Thrift JDBC/ODBC server might be the right tool to handle the queries. However, the documentation for the Thrift server <http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server> seems to be written for Hive users who are moving to Spark. I never used Hive before I started using Spark, so it is not clear to me how best to use this.
I've tried putting data into Hive, then serving it with the Thrift server. But I have not been able to update the data in Hive without first shutting down the server. This is a problem because new data is always being streamed in, and so the data must continuously be updated. The system I'm building is supposed to replace a system that stores the data in MongoDB. The dataset has now grown so large that the database index does not fit in memory, which causes major performance problems in MongoDB. If the Thrift server is the right tool for me, how can I set it up for my application? If it is not the right tool, what else can I use? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org