Hi,

Just to give some context. We are using Hive metastore with csv & Parquet
files as a part of our ETL pipeline. We query these with SparkSQL to do
some down stream work.

I'm curious whats the best way to go about testing Hive & SparkSQL? I'm
using 1.1.0

I see that the LocalHiveContext has been depreciated.
https://issues.apache.org/jira/browse/SPARK-2397

My testing strategy is as part of my Before block I basically create the
HiveContext then create the databases/tables and map them to some test
sample data files in my test resources directory.

The LocalSparkContext was useful because I could inject this as part of the
test setup and it would take care of creating the metastore and warehouse
directories for hive for me (local to my project). If I just create a Hive
context it does create the metastore_db folder locally. But the warehouse
directory is not created! Thus running a command like hc.sql("CREATE
DATABASE myDb") results in a Hive error. I also can't supply a test
hive-site.xml because it wont allow relative paths. Which means that there
is some shared directory that everyone needs to have. The only other option
is to call the setConf method like LocalSparkContext does.

Since LocalSparkContext is on the way out, I'm wondering if I'm doing
something stupid.

Is there a better way to mock this out and test Hive/metastore with
SparkSQL?

Cheers,
~N

Reply via email to