[ 
https://issues.apache.org/jira/browse/SPARK-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009920#comment-15009920
 ] 

Mark Grover commented on SPARK-9896:
------------------------------------

Posting here in case someone runs into this as well. Based on my experience, 
this happens when the client configuration for accessing HDFS/S3 is incorrect 
(which was the case for me) or if possibly if HDFS is inaccessible.

In my case, I was accessing a remote non-secure Hadoop cluster from a node 
which had secure HDFS configuration. So, while the error message should be 
something more relevant and unrelated to Parquet, this inaccessibility shows up 
as a Parquet metadata error. For me, I fixed it by doing a kdestroy on the 
gateway/client node and was able to run it fine. 

If you are hitting this, try accessing s3 or hdfs from that node, using the 
client configuration (say using 'hadoop fs -ls <path>'). If that doesn't 
succeed, that's your root cause.

I will file a separate JIRA for improving the error message.

> Parquet Schema Assertion
> ------------------------
>
>                 Key: SPARK-9896
>                 URL: https://issues.apache.org/jira/browse/SPARK-9896
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Priority: Blocker
>
> Need to investigate more, but I'm seeing this all of a sudden.
> {code}
> java.lang.AssertionError: assertion failed: No predefined schema found, and 
> no Parquet data files or summary files found under 
> s3n:/.../databricks-performance-datasets/tpcds/sf1500-parquet/useDecimal=true/parquet/item
> {code}
> Possibly related to [SPARK-9407].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to