Still trying to get my head around Spark SQL & Hive. 1) Let's assume I *only* use Spark SQL to create and insert data into HIVE tables, declared in a Hive meta-store.
Does it matter at all if Hive supports the data types I need with Parquet, or is all that matters what Catalyst & spark's parquet relation support ? Case in point : timestamps & Parquet * Parquet now supports them as per https://github.com/Parquet/parquet-mr/issues/218 * Hive only supports them in 0.14 So would I be able to read/write timestamps natively in Spark 1.2 ? Spark 1.3 ? I have found this thread http://apache-spark-user-list.1001560.n3.nabble.com/timestamp-not-implemented-yet-td15414.html which seems to indicate that the data types supported by Hive would matter to Spark SQL. If so, why is that ? Doesn't the read path go through Spark SQL to read the parquet file ? 2) Is there planned support for Hive 0.14 ? Thanks