I'm a bit confused about the documentation in the area of Hive support. I want to use a remote Hive metastore/hdfs server and the documentation says that we need to build Spark from source due to the large number of dependencies Hive requires.
Specifically the documentation says: "Hive has a large number of dependencies, it is not included in the default Spark assembly....This command builds a new assembly jar that includes Hive." So I downloaded the source distribution of Spark 1.4.1 and executed the following build command: ./make-distribution.sh --name spark-1.4.1-hadoop-2.6-hive --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests Inspecting the size of the resulting spark-assembly-1.4.1-hadoop2.6.0.jar it is only a few bytes different ie. Pre-built jar is 162976273 bytes and my custom built jar is 162976444. I don't see any new hive jar file either? Can someone please help me understand what is going on here? Cheers, Reece -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-I-really-need-to-build-Spark-for-Hive-Thrift-Server-support-tp24013.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org