Re: Query data in Spark RRD

2015-02-23 Thread Tathagata Das
You could build a rest API, but you may have issue if you want to return back arbitrary binary data. A more complex but robust alternative is to use some RPC libraries like Akka, Thrift, etc. TD On Mon, Feb 23, 2015 at 12:45 AM, Nikhil Bafna nikhil.ba...@flipkart.com wrote: Tathagata - Yes,

Re: Query data in Spark RRD

2015-02-23 Thread Tathagata Das
You will have a build a split infrastructure - a front end that takes the queries from the UI and sends them to the backend, and the backend (running the Spark Streaming app) will actually run the queries on table created in the contexts. The RPCs necessary between the frontend and backend will

Re: Query data in Spark RRD

2015-02-23 Thread Nikhil Bafna
Tathagata - Yes, I'm thinking on that line. The problem is how to send to send the query to the backend? Bundle a http server into a spark streaming job, that will accept the parameters? -- Nikhil Bafna On Mon, Feb 23, 2015 at 2:04 PM, Tathagata Das t...@databricks.com wrote: You will have a

Re: Query data in Spark RRD

2015-02-21 Thread Nikhil Bafna
Yes. As my understanding, it would allow me to write SQLs to query a spark context. But, the query needs to be specified within a job deployed. What I want is to be able to run multiple dynamic queries specified at runtime from a dashboard. -- Nikhil Bafna On Sat, Feb 21, 2015 at 8:37 PM,

Re: Query data in Spark RRD

2015-02-21 Thread Ted Yu
Have you looked at http://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD ? Cheers On Sat, Feb 21, 2015 at 4:24 AM, Nikhil Bafna nikhil.ba...@flipkart.com wrote: Hi. My use case is building a realtime monitoring system over multi-dimensional data. The way

Query data in Spark RRD

2015-02-21 Thread Nikhil Bafna
Hi. My use case is building a realtime monitoring system over multi-dimensional data. The way I'm planning to go about it is to use Spark Streaming to store aggregated count over all dimensions in 10 sec interval. Then, from a dashboard, I would be able to specify a query over some dimensions,