Yeah, this is a typical Parquet interoperability issue due to
unfortunate historical reasons. Hive (actually parquet-hive) gives the
following schema for array:
message m0 {
optional group f (LIST) {
repeated group bag {
optional int32 array_element;
}
}
}
while Spark SQL gives
Thank you - it works if the file is created in Spark
On Mon, Sep 7, 2015 at 3:06 PM, Ruslan Dautkhanov
wrote:
> Read response from Cheng Lian on Aug/27th - it
> looks the same problem.
>
> Workarounds
> 1. write that parquet file in Spark;
> 2.
No, it was created in Hive by CTAS, but any help is appreciated...
On Mon, Sep 7, 2015 at 2:51 PM, Ruslan Dautkhanov
wrote:
> That parquet table wasn't created in Spark, is it?
>
> There was a recent discussion on this list that complex data types in
> Spark prior to 1.5
Read response from Cheng Lian on Aug/27th - it
looks the same problem.
Workarounds
1. write that parquet file in Spark;
2. upgrade to Spark 1.5.
--
Ruslan Dautkhanov
On Mon, Sep 7, 2015 at 3:52 PM, Alex Kozlov wrote:
> No, it was created in Hive by
That parquet table wasn't created in Spark, is it?
There was a recent discussion on this list that complex data types in Spark
prior to 1.5 often incompatible with Hive for example, if I remember
correctly.
On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov wrote:
> I am trying to read
The same error if I do:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val results = sqlContext.sql("SELECT * FROM stats")
but it does work from Hive shell directly...
On Mon, Sep 7, 2015 at 1:56 PM, Alex Kozlov wrote:
> I am trying to read an (array typed)
I am trying to read an (array typed) parquet file in spark-shell (Spark
1.4.1 with Hadoop 2.6):
{code}
$ bin/spark-shell
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See