https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
There are quite a lot of knobs to tune for Hive on Spark. Above page recommends following settings: mapreduce.input.fileinputformat.split.maxsize=750000000 > hive.vectorized.execution.enabled=true > hive.cbo.enable=true > hive.optimize.reducededuplication.min.reducer=4 > hive.optimize.reducededuplication=true > hive.orc.splits.include.file.footer=false > hive.merge.mapfiles=true > hive.merge.sparkfiles=false > hive.merge.smallfiles.avgsize=16000000 > hive.merge.size.per.task=256000000 > hive.merge.orcfile.stripe.level=true > hive.auto.convert.join=true > hive.auto.convert.join.noconditionaltask=true > hive.auto.convert.join.noconditionaltask.size=894435328 > hive.optimize.bucketmapjoin.sortedmerge=false > hive.map.aggr.hash.percentmemory=0.5 > hive.map.aggr=true > hive.optimize.sort.dynamic.partition=false > hive.stats.autogather=true > hive.stats.fetch.column.stats=true > hive.vectorized.execution.reduce.enabled=false > hive.vectorized.groupby.checkinterval=4096 > hive.vectorized.groupby.flush.percent=0.1 > hive.compute.query.using.stats=true > hive.limit.pushdown.memory.usage=0.4 > hive.optimize.index.filter=true > hive.exec.reducers.bytes.per.reducer=67108864 > hive.smbjoin.cache.rows=10000 > hive.exec.orc.default.stripe.size=67108864 > hive.fetch.task.conversion=more > hive.fetch.task.conversion.threshold=1073741824 > hive.fetch.task.aggr=false > mapreduce.input.fileinputformat.list-status.num-threads=5 > spark.kryo.referenceTracking=false > > spark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch Did it work for everybody? It may take days if not weeks to try to tune all of these parameters for a specific job. We're on Spark 1.5 / Hive 1.1. ps. We have a job that can't get working well as a Hive job, so thought to use Hive on Spark instead. (a 3-table full outer joins with group by + collect_list). Spark should handle this much better. Ruslan