OK, I've merged this PR to master and branch-2.0.
On 8/11/16 8:27 AM, Cheng Lian wrote:
Haven't figured out the exactly way how it failed, but the leading
underscore in the partition directory name looks suspicious. Could you
please try this PR to see whether it fixes the issue:
Haven't figured out the exactly way how it failed, but the leading
underscore in the partition directory name looks suspicious. Could you
please try this PR to see whether it fixes the issue:
https://github.com/apache/spark/pull/14585/files
Cheng
On 8/9/16 5:38 PM, immerrr again wrote:
Another follow-up: I have narrowed it down to the first 32 partitions,
but from that point it gets strange.
Here's the error:
In [68]: spark.read.parquet(*subdirs[:32])
...
AnalysisException: u'Unable to infer schema for ParquetFormat at
Some follow-up information:
- dataset size is ~150G
- the data is partitioned by one of the columns, _locality_code:
$ ls -1
_locality_code=AD
_locality_code=AE
_locality_code=AF
_locality_code=AG
_locality_code=AI
_locality_code=AL
_locality_code=AM
_locality_code=AN
_locality_code=YE
Hi everyone
I tried upgrading Spark-1.6.2 to Spark-2.0.0 but run into an issue
reading the existing data. Here's how the traceback looks in
spark-shell:
scala> spark.read.parquet("/path/to/data")
org.apache.spark.sql.AnalysisException: Unable to infer schema for
ParquetFormat at /path/to/data.