You could consider using Zeppelin and spark on yarn as an alternative. http://zeppelin.incubator.apache.org/
Simon > On 16 Jun 2015, at 17:58, Sanjay Subramanian > <sanjaysubraman...@yahoo.com.INVALID> wrote: > > hey guys > > After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS is > not supported by Databricks cloud. > My speed bottleneck is to transfer ~1TB of snapshot HDFS data (250+ external > hive tables) to S3 :-( > > I want to use databricks cloud but this to me is a starting disabler. > The hard road for me will be (as I believe EVERYTHING is possible. The > impossible just takes longer) > - transfer all HDFS to S3 > - our org does not permit AWS server side encryption so I have figure out if > AWS KMS encrypted S3 files can be read by Hive/Impala/Spark > - modify all table locations in metadata to S3 > - modify all scripts to point and write to S3 instead of > > Any ideas / thoughts will be helpful. > > Till I can get the above figured out , I am going ahead and working hard to > make spark-sql as the main workhorse for creating dataset (now its Hive and > Impala) > > > thanks > regards > > sanjay > >