[ https://issues.apache.org/jira/browse/SPARK-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009920#comment-15009920 ]
Mark Grover commented on SPARK-9896: ------------------------------------ Posting here in case someone runs into this as well. Based on my experience, this happens when the client configuration for accessing HDFS/S3 is incorrect (which was the case for me) or if possibly if HDFS is inaccessible. In my case, I was accessing a remote non-secure Hadoop cluster from a node which had secure HDFS configuration. So, while the error message should be something more relevant and unrelated to Parquet, this inaccessibility shows up as a Parquet metadata error. For me, I fixed it by doing a kdestroy on the gateway/client node and was able to run it fine. If you are hitting this, try accessing s3 or hdfs from that node, using the client configuration (say using 'hadoop fs -ls <path>'). If that doesn't succeed, that's your root cause. I will file a separate JIRA for improving the error message. > Parquet Schema Assertion > ------------------------ > > Key: SPARK-9896 > URL: https://issues.apache.org/jira/browse/SPARK-9896 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Michael Armbrust > Priority: Blocker > > Need to investigate more, but I'm seeing this all of a sudden. > {code} > java.lang.AssertionError: assertion failed: No predefined schema found, and > no Parquet data files or summary files found under > s3n:/.../databricks-performance-datasets/tpcds/sf1500-parquet/useDecimal=true/parquet/item > {code} > Possibly related to [SPARK-9407]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org