[ https://issues.apache.org/jira/browse/SPARK-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandy Ryza updated SPARK-5151: ------------------------------ Component/s: (was: Spark Core) > Parquet Predicate Pushdown Does Not Work with Nested Structures. > ---------------------------------------------------------------- > > Key: SPARK-5151 > URL: https://issues.apache.org/jira/browse/SPARK-5151 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Environment: pyspark, spark-ec2 created cluster > Reporter: Brad Willard > Labels: parquet, pyspark, sql > > I have json files of objects created with a nested structure roughly of the > formof the form: > { id: 123, event: "login", meta_data: {'user: "user1"}} > .... > { id: 125, event: "login", meta_data: {'user: "user2"}} > I load the data via spark with > rdd = sql_context.jsonFile() > # save it as a parquet file > rdd.saveAsParquetFile() > rdd = sql_context.parquetFile() > rdd.registerTempTable('events') > so if I run this query it works without issue if predicate pushdown is > disabled > select count(1) from events where meta_data.user = "user1" > if I enable predicate pushdown I get an error saying meta_data.user is not in > the schema > Py4JJavaError: An error occurred while calling o218.collect. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 125 > in stage 12.0 failed 4 times, most recent failure: Lost task 125.3 in stage > 12.0 (TID 6164, ): java.lang.IllegalArgumentException: Column [user] was not > found in schema! > at parquet.Preconditions.checkArgument(Preconditions.java:47) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) > at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) > ..... > I expect this is actually related to another bug I filed where nested > structure is not preserved with spark sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org