SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Pei-Lun Lee
Hi, I am using Spark 1.3.0, where I cannot load parquet files from more than one file system, say one s3n://... and another hdfs://..., which worked in older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. One way to fix this is instead of get a single FileSystem from

Re: Which OutputCommitter to use for S3?

2015-03-16 Thread Pei-Lun Lee
that direct dependency makes this injection much more difficult for saveAsParquetFile. On Thu, Mar 5, 2015 at 12:28 AM, Pei-Lun Lee pl...@appier.com wrote: Thanks for the DirectOutputCommitter example. However I found it only works for saveAsHadoopFile. What about saveAsParquetFile? It looks

Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-15 Thread Pei-Lun Lee
. Is there anyone can help me? Thanks Wisely Chen On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee pl...@appier.com wrote: Hi, I found that if I try to read parquet file generated by spark 1.1.1 using 1.3.0-rc3 by default settings, I got this error

SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-10 Thread Pei-Lun Lee
Hi, I found that if I try to read parquet file generated by spark 1.1.1 using 1.3.0-rc3 by default settings, I got this error: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'StructType': was expecting ('true', 'false' or 'null') at [Source:

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Pei-Lun Lee
Thanks for the DirectOutputCommitter example. However I found it only works for saveAsHadoopFile. What about saveAsParquetFile? It looks like SparkSQL is using ParquetOutputCommitter, which is subclass of FileOutputCommitter. On Fri, Feb 27, 2015 at 1:52 AM, Thomas Demoor