Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood
Also attaching the parquet file if anyone wants to take a further look. On Thu, Nov 20, 2014 at 8:54 AM, Sadhan Sood sadhan.s...@gmail.com wrote: So, I am seeing this issue with spark sql throwing an exception when trying to read selective columns from a thrift parquet file and also when

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Michael Armbrust
Which version are you running on again? On Thu, Nov 20, 2014 at 8:17 AM, Sadhan Sood sadhan.s...@gmail.com wrote: Also attaching the parquet file if anyone wants to take a further look. On Thu, Nov 20, 2014 at 8:54 AM, Sadhan Sood sadhan.s...@gmail.com wrote: So, I am seeing this issue

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood
I am running on master, pulled yesterday I believe but saw the same issue with 1.2.0 On Thu, Nov 20, 2014 at 1:37 PM, Michael Armbrust mich...@databricks.com wrote: Which version are you running on again? On Thu, Nov 20, 2014 at 8:17 AM, Sadhan Sood sadhan.s...@gmail.com wrote: Also

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood
Thanks Michael, opened this https://issues.apache.org/jira/browse/SPARK-4520 On Thu, Nov 20, 2014 at 2:59 PM, Michael Armbrust mich...@databricks.com wrote: Can you open a JIRA? On Thu, Nov 20, 2014 at 10:39 AM, Sadhan Sood sadhan.s...@gmail.com wrote: I am running on master, pulled

Re: SparkSQL exception on cached parquet table

2014-11-16 Thread Cheng Lian
(Forgot to cc user mail list) On 11/16/14 4:59 PM, Cheng Lian wrote: Hey Sadhan, Thanks for the additional information, this is helpful. Seems that some Parquet internal contract was broken, but I'm not sure whether it's caused by Spark SQL or Parquet, or even maybe the Parquet file itself

Re: SparkSQL exception on cached parquet table

2014-11-16 Thread Sadhan Sood
Hi Cheng, I tried reading the parquet file(on which we were getting the exception) through parquet-tools and it is able to dump the file and I can read the metadata, etc. I also loaded the file through hive table and can run a table scan query on it as well. Let me know if I can do more to help

Re: SparkSQL exception on cached parquet table

2014-11-15 Thread Cheng Lian
Hi Sadhan, Could you please provide the stack trace of the |ArrayIndexOutOfBoundsException| (if any)? The reason why the first query succeeds is that Spark SQL doesn’t bother reading all data from the table to give |COUNT(*)|. In the second case, however, the whole table is asked to be

Re: SparkSQL exception on cached parquet table

2014-11-15 Thread sadhan
Hi Cheng, Thanks for your response.Here is the stack trace from yarn logs: -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-exception-on-cached-parquet-table-tp18978p19020.html Sent from the Apache Spark User List mailing list archive

SparkSQL exception on cached parquet table

2014-11-14 Thread Sadhan Sood
While testing SparkSQL on a bunch of parquet files (basically used to be a partition for one of our hive tables), I encountered this error: import org.apache.spark.sql.SchemaRDD import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path;