Re: Problem reading Parquet from 1.2 to 1.3

2015-06-07 Thread Don Drake
Thanks Cheng, we have a workaround in place for Spark 1.3 (remove .metadata directory), good to know it will be resolved in 1.4. -Don On Sun, Jun 7, 2015 at 8:51 AM, Cheng Lian lian.cs@gmail.com wrote: This issue has been fixed recently in Spark 1.4

Re: Problem reading Parquet from 1.2 to 1.3

2015-06-07 Thread Cheng Lian
This issue has been fixed recently in Spark 1.4 https://github.com/apache/spark/pull/6581 Cheng On 6/5/15 12:38 AM, Marcelo Vanzin wrote: I talked to Don outside the list and he says that he's seeing this issue with Apache Spark 1.3 too (not just CDH Spark), so it seems like there is a real

Re: Problem reading Parquet from 1.2 to 1.3

2015-06-04 Thread Marcelo Vanzin
I talked to Don outside the list and he says that he's seeing this issue with Apache Spark 1.3 too (not just CDH Spark), so it seems like there is a real issue here. On Wed, Jun 3, 2015 at 1:39 PM, Don Drake dondr...@gmail.com wrote: As part of upgrading a cluster from CDH 5.3.x to CDH 5.4.x I

Problem reading Parquet from 1.2 to 1.3

2015-06-03 Thread Don Drake
As part of upgrading a cluster from CDH 5.3.x to CDH 5.4.x I noticed that Spark is behaving differently when reading Parquet directories that contain a .metadata directory. It seems that in spark 1.2.x, it would just ignore the .metadata directory, but now that I'm using Spark 1.3, reading these

Re: Problem reading Parquet from 1.2 to 1.3

2015-06-03 Thread Marcelo Vanzin
(bcc: user@spark, cc:cdh-user@cloudera) If you're using CDH, Spark SQL is currently unsupported and mostly untested. I'd recommend trying to use it in CDH. You could try an upstream version of Spark instead. On Wed, Jun 3, 2015 at 1:39 PM, Don Drake dondr...@gmail.com wrote: As part of