separate spark and hive

assaf.mendelson Mon, 14 Nov 2016 23:25:16 -0800

Hi,
Today, we basically force people to use hive if they want to get the full use 
of spark SQL.
When doing the default installation this means that a derby.log and 
metastore_db directory are created where we run from.
The problem with this is that if we run multiple scripts from the same working 
directory we have a problem.
The solution we employ locally is to always run from different directory as we 
ignore hive in practice (this of course means we lose the ability to use some 
of the catalog options in spark session).
The only other solution is to create a full blown hive installation with proper 
configuration (probably for a JDBC solution).


I would propose that in most cases there shouldn't be any hive use at all. Even 
for catalog elements such as saving a permanent table, we should be able to 
configure a target directory and simply write to it (doing everything file 
based to avoid the need for locking). Hive should be reserved for those who 
actually use it (probably for backward compatibility).

Am I missing something here?
Assaf.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/separate-spark-and-hive-tp19879.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

separate spark and hive

Reply via email to