Re: Silly Question on my part...

2016-05-17 Thread Gene Pang
Hi Michael, Yes, you can use Alluxio to share Spark RDDs. Here is a blog post about getting started with Spark and Alluxio ( http://www.alluxio.com/2016/04/getting-started-with-alluxio-and-spark/), and some documentation ( http://alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html).

Re: Silly Question on my part...

2016-05-17 Thread Dood
On 5/16/2016 12:12 PM, Michael Segel wrote: For one use case.. we were considering using the thrift server as a way to allow multiple clients access shared RDDs. Within the Thrift Context, we create an RDD and expose it as a hive table. The question is… where does the RDD exist. On the

Re: Silly Question on my part...

2016-05-17 Thread Michael Segel
Thanks for the response. That’s what I thought, but I didn’t want to assume anything. (You know what happens when you ass u me … :-) Not sure about Tachyon though. Its a thought, but I’m very conservative when it comes to design choices. > On May 16, 2016, at 5:21 PM, John Trengrove

Re: Silly Question on my part...

2016-05-16 Thread John Trengrove
If you are wanting to share RDDs it might be a good idea to check out Tachyon / Alluxio. For the Thrift server, I believe the datasets are located in your Spark cluster as RDDs and you just communicate with it via the Thrift JDBC Distributed Query Engine connector. 2016-05-17 5:12 GMT+10:00

Silly Question on my part...

2016-05-16 Thread Michael Segel
For one use case.. we were considering using the thrift server as a way to allow multiple clients access shared RDDs. Within the Thrift Context, we create an RDD and expose it as a hive table. The question is… where does the RDD exist. On the Thrift service node itself, or is that just a