HDFS not supported by databricks cloud :-(

Sanjay Subramanian Tue, 16 Jun 2015 10:00:58 -0700

hey guys
After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS is 
not supported by Databricks cloud.My speed bottleneck is to transfer ~1TB of 
snapshot HDFS data (250+ external hive tables) to S3 :-( 
I want to use databricks cloud but this to me is a starting disabler.The hard 
road for me will be (as I believe EVERYTHING is possible. The impossible just 
takes longer) - transfer all HDFS to S3- our org does not permit AWS server 
side encryption so I have figure out if AWS KMS encrypted S3 files can be read 
by Hive/Impala/Spark  - modify all table locations in metadata to S3- modify 
all scripts to point and write to S3 instead of   
Any ideas / thoughts will be helpful.
Till I can get the above figured out , I am going ahead and working hard to 
make spark-sql as the main workhorse for creating dataset (now its Hive and 
Impala)


thanksregards
sanjay

HDFS not supported by databricks cloud :-(

Reply via email to