[ https://issues.apache.org/jira/browse/SPARK-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-6554: ------------------------------ Description: I'm having trouble referencing partition columns in my queries with Parquet. In the following example, 'probeTypeId' is a partition column. For example, the directory structure looks like this: {noformat} /mydata /probeTypeId=1 ...files... /probeTypeId=2 ...files... {noformat} I see the column when I reference load a DF using the /mydata directory and call df.printSchema(): {noformat} |-- probeTypeId: integer (nullable = true) {noformat} Parquet is also aware of the column: {noformat} optional int32 probeTypeId; {noformat} And this works fine: {code} sqlContext.sql("select probeTypeId from df limit 1"); {code} ...as does {{df.show()}} - it shows the correct values for the partition column. However, when I try to use a partition column in a where clause, I get an exception stating that the column was not found in the schema: {noformat} sqlContext.sql("select probeTypeId from df where probeTypeId = 1 limit 1"); ... ... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) ... ... {noformat} Here's the full stack trace: {noformat} using local[*] for master 06:05:55,675 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set 06:05:55,683 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender] 06:05:55,694 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT] 06:05:55,721 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 06:05:55,768 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO 06:05:55,768 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT] 06:05:55,769 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration. 06:05:55,770 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6aaceffd - Registering current configuration as safe fallback point INFO org.apache.spark.SparkContext Running Spark version 1.3.0 WARN o.a.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO org.apache.spark.SecurityManager Changing view acls to: jon INFO org.apache.spark.SecurityManager Changing modify acls to: jon INFO org.apache.spark.SecurityManager SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jon); users with modify permissions: Set(jon) INFO akka.event.slf4j.Slf4jLogger Slf4jLogger started INFO Remoting Starting remoting INFO Remoting Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.134:62493] INFO org.apache.spark.util.Utils Successfully started service 'sparkDriver' on port 62493. INFO org.apache.spark.SparkEnv Registering MapOutputTracker INFO org.apache.spark.SparkEnv Registering BlockManagerMaster INFO o.a.spark.storage.DiskBlockManager Created local directory at /var/folders/x7/9hdp8kw9569864088tsl4jmm0000gn/T/spark-150e23b2-ff19-4a51-8cfc-25fb8e1b3f2b/blockmgr-6eea286c-7473-4bda-8886-7250156b68f4 INFO org.apache.spark.storage.MemoryStore MemoryStore started with capacity 1966.1 MB INFO org.apache.spark.HttpFileServer HTTP File server directory is /var/folders/x7/9hdp8kw9569864088tsl4jmm0000gn/T/spark-cf4687bd-1563-4ddf-b697-21c96fd95561/httpd-6343b9c9-bb66-43ac-ac43-6da80c7a1f95 INFO org.apache.spark.HttpServer Starting HTTP Server INFO o.spark-project.jetty.server.Server jetty-8.y.z-SNAPSHOT INFO o.s.jetty.server.AbstractConnector Started SocketConnector@0.0.0.0:62494 INFO org.apache.spark.util.Utils Successfully started service 'HTTP file server' on port 62494. INFO org.apache.spark.SparkEnv Registering OutputCommitCoordinator INFO o.spark-project.jetty.server.Server jetty-8.y.z-SNAPSHOT INFO o.s.jetty.server.AbstractConnector Started SelectChannelConnector@0.0.0.0:4040 INFO org.apache.spark.util.Utils Successfully started service 'SparkUI' on port 4040. INFO org.apache.spark.ui.SparkUI Started SparkUI at http://192.168.1.134:4040 INFO org.apache.spark.executor.Executor Starting executor ID <driver> on host localhost INFO org.apache.spark.util.AkkaUtils Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.134:62493/user/HeartbeatReceiver INFO o.a.s.n.n.NettyBlockTransferService Server created on 62495 INFO o.a.spark.storage.BlockManagerMaster Trying to register BlockManager INFO o.a.s.s.BlockManagerMasterActor Registering block manager localhost:62495 with 1966.1 MB RAM, BlockManagerId(<driver>, localhost, 62495) INFO o.a.spark.storage.BlockManagerMaster Registered BlockManager INFO o.a.h.conf.Configuration.deprecation mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize INFO o.a.h.conf.Configuration.deprecation mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative INFO o.a.h.conf.Configuration.deprecation mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node INFO o.a.h.conf.Configuration.deprecation mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces INFO o.a.h.conf.Configuration.deprecation mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive INFO o.a.h.hive.metastore.HiveMetaStore 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore INFO o.a.h.hive.metastore.ObjectStore ObjectStore, initialize called INFO DataNucleus.Persistence Property hive.metastore.integral.jdo.pushdown unknown - will be ignored INFO DataNucleus.Persistence Property datanucleus.cache.level2 unknown - will be ignored INFO o.a.h.hive.metastore.ObjectStore Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" INFO o.a.h.h.metastore.MetaStoreDirectSql MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Query Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing INFO o.a.h.hive.metastore.ObjectStore Initialized ObjectStore INFO o.a.h.hive.metastore.HiveMetaStore Added admin role in metastore INFO o.a.h.hive.metastore.HiveMetaStore Added public role in metastore INFO o.a.h.hive.metastore.HiveMetaStore No user is added in admin role, since config is empty INFO o.a.h.hive.ql.session.SessionState No Tez session required at this point. hive.execution.engine=mr. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. root |-- clientMarketId: integer (nullable = true) |-- clientCountryId: integer (nullable = true) |-- clientRegionId: integer (nullable = true) |-- clientStateId: integer (nullable = true) |-- clientAsnId: integer (nullable = true) |-- reporterZoneId: integer (nullable = true) |-- reporterCustomerId: integer (nullable = true) |-- responseCode: integer (nullable = true) |-- measurementValue: integer (nullable = true) |-- year: integer (nullable = true) |-- month: integer (nullable = true) |-- day: integer (nullable = true) |-- providerOwnerZoneId: integer (nullable = true) |-- providerOwnerCustomerId: integer (nullable = true) |-- providerId: integer (nullable = true) |-- probeTypeId: integer (nullable = true) ====================================================== INFO hive.ql.parse.ParseDriver Parsing command: select probeTypeId from df where probeTypeId = 1 limit 1 INFO hive.ql.parse.ParseDriver Parse Completed ==== results for select probeTypeId from df where probeTypeId = 1 limit 1 ====================================================== INFO o.a.s.sql.parquet.ParquetRelation2 Reading 33.33333333333333% of partitions INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(191336) called with curMem=0, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_0 stored as values in memory (estimated size 186.9 KB, free 1966.0 MB) INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(27530) called with curMem=191336, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.9 KB, free 1965.9 MB) INFO o.a.spark.storage.BlockManagerInfo Added broadcast_0_piece0 in memory on localhost:62495 (size: 26.9 KB, free: 1966.1 MB) INFO o.a.spark.storage.BlockManagerMaster Updated info of block broadcast_0_piece0 INFO org.apache.spark.SparkContext Created broadcast 0 from NewHadoopRDD at newParquet.scala:447 INFO o.a.h.conf.Configuration.deprecation mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize INFO o.a.s.s.p.ParquetRelation2$$anon$1$$anon$2 Using Task Side Metadata Split Strategy INFO org.apache.spark.SparkContext Starting job: runJob at SparkPlan.scala:121 INFO o.a.spark.scheduler.DAGScheduler Got job 0 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false) INFO o.a.spark.scheduler.DAGScheduler Final stage: Stage 0(runJob at SparkPlan.scala:121) INFO o.a.spark.scheduler.DAGScheduler Parents of final stage: List() INFO o.a.spark.scheduler.DAGScheduler Missing parents: List() INFO o.a.spark.scheduler.DAGScheduler Submitting Stage 0 (MapPartitionsRDD[3] at map at SparkPlan.scala:96), which has no missing parents INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(5512) called with curMem=218866, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_1 stored as values in memory (estimated size 5.4 KB, free 1965.9 MB) INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(3754) called with curMem=224378, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.7 KB, free 1965.9 MB) INFO o.a.spark.storage.BlockManagerInfo Added broadcast_1_piece0 in memory on localhost:62495 (size: 3.7 KB, free: 1966.1 MB) INFO o.a.spark.storage.BlockManagerMaster Updated info of block broadcast_1_piece0 INFO org.apache.spark.SparkContext Created broadcast 1 from broadcast at DAGScheduler.scala:839 INFO o.a.spark.scheduler.DAGScheduler Submitting 1 missing tasks from Stage 0 (MapPartitionsRDD[3] at map at SparkPlan.scala:96) INFO o.a.s.scheduler.TaskSchedulerImpl Adding task set 0.0 with 1 tasks INFO o.a.spark.scheduler.TaskSetManager Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1687 bytes) INFO org.apache.spark.executor.Executor Running task 0.0 in stage 0.0 (TID 0) INFO o.a.s.s.p.ParquetRelation2$$anon$1 Input split: ParquetInputSplit{part: file:/Users/jon/Downloads/sparksql/1partitionsminusgeo/year=2015/month=1/day=14/providerOwnerZoneId=0/providerOwnerCustomerId=0/providerId=287/probeTypeId=1/part-r-00001.parquet start: 0 end: 8851183 length: 8851183 hosts: [] requestedSchema: message root { optional int32 probeTypeId; } readSupportMetadata: {org.apache.spark.sql.parquet.row.requested_schema={"type":"struct","fields":[{"name":"probeTypeId","type":"integer","nullable":true,"metadata":{}}]}, org.apache.spark.sql.parquet.row.metadata={"type":"struct","fields":[{"name":"clientMarketId","type":"integer","nullable":true,"metadata":{}},{"name":"clientCountryId","type":"integer","nullable":true,"metadata":{}},{"name":"clientRegionId","type":"integer","nullable":true,"metadata":{}},{"name":"clientStateId","type":"integer","nullable":true,"metadata":{}},{"name":"clientAsnId","type":"integer","nullable":true,"metadata":{}},{"name":"reporterZoneId","type":"integer","nullable":true,"metadata":{}},{"name":"reporterCustomerId","type":"integer","nullable":true,"metadata":{}},{"name":"responseCode","type":"integer","nullable":true,"metadata":{}},{"name":"measurementValue","type":"integer","nullable":true,"metadata":{}},{"name":"year","type":"integer","nullable":true,"metadata":{}},{"name":"month","type":"integer","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"providerOwnerZoneId","type":"integer","nullable":true,"metadata":{}},{"name":"providerOwnerCustomerId","type":"integer","nullable":true,"metadata":{}},{"name":"providerId","type":"integer","nullable":true,"metadata":{}},{"name":"probeTypeId","type":"integer","nullable":true,"metadata":{}}]}}} ERROR org.apache.spark.executor.Executor Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) ~[parquet-common-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) ~[parquet-hadoop-1.6.0rc3.jar:na] at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.scheduler.Task.run(Task.scala:64) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) ~[spark-core_2.10-1.3.0.jar:1.3.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] WARN o.a.spark.scheduler.TaskSetManager Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR o.a.spark.scheduler.TaskSetManager Task 0 in stage 0.0 failed 1 times; aborting job INFO o.a.s.scheduler.TaskSchedulerImpl Removed TaskSet 0.0, whose tasks have all completed, from pool INFO o.a.s.scheduler.TaskSchedulerImpl Cancelling stage 0 INFO o.a.spark.scheduler.DAGScheduler Job 0 failed: runJob at SparkPlan.scala:121, took 0.132538 s INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/metrics/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/static,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/environment/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/environment,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/pool,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/stage,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs/job,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs,null} INFO org.apache.spark.ui.SparkUI Stopped Spark web UI at http://192.168.1.134:4040 INFO o.a.spark.scheduler.DAGScheduler Stopping DAGScheduler INFO o.a.s.MapOutputTrackerMasterActor MapOutputTrackerActor stopped! INFO org.apache.spark.storage.MemoryStore MemoryStore cleared INFO o.apache.spark.storage.BlockManager BlockManager stopped INFO o.a.spark.storage.BlockManagerMaster BlockManagerMaster stopped INFO o.a.s.s.OutputCommitCoordinator$OutputCommitCoordinatorActor OutputCommitCoordinator stopped! INFO org.apache.spark.SparkContext Successfully stopped SparkContext org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Mar 26, 2015 6:06:02 AM INFO: parquet.filter2.compat.FilterCompat: Filtering using predicate: eq(probeTypeId, 1) Mar 26, 2015 6:06:02 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Mar 26, 2015 6:06:02 AM INFO: parquet.filter2.compat.FilterCompat: Filtering using predicate: eq(probeTypeId, 1) Process finished with exit code 255 {noformat} was: I'm having trouble referencing partition columns in my queries with Parquet. In the following example, 'probeTypeId' is a partition column. For example, the directory structure looks like this: /mydata /probeTypeId=1 ...files... /probeTypeId=2 ...files... I see the column when I reference load a DF using the /mydata directory and call df.printSchema(): ... |-- probeTypeId: integer (nullable = true) ... Parquet is also aware of the column: optional int32 probeTypeId; And this works fine: sqlContext.sql("select probeTypeId from df limit 1"); ...as does df.show() - it shows the correct values for the partition column. However, when I try to use a partition column in a where clause, I get an exception stating that the column was not found in the schema: sqlContext.sql("select probeTypeId from df where probeTypeId = 1 limit 1"); ... ... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) ... ... Here's the full stack trace: using local[*] for master 06:05:55,675 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set 06:05:55,683 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender] 06:05:55,694 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT] 06:05:55,721 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 06:05:55,768 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO 06:05:55,768 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT] 06:05:55,769 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration. 06:05:55,770 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6aaceffd - Registering current configuration as safe fallback point INFO org.apache.spark.SparkContext Running Spark version 1.3.0 WARN o.a.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO org.apache.spark.SecurityManager Changing view acls to: jon INFO org.apache.spark.SecurityManager Changing modify acls to: jon INFO org.apache.spark.SecurityManager SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jon); users with modify permissions: Set(jon) INFO akka.event.slf4j.Slf4jLogger Slf4jLogger started INFO Remoting Starting remoting INFO Remoting Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.134:62493] INFO org.apache.spark.util.Utils Successfully started service 'sparkDriver' on port 62493. INFO org.apache.spark.SparkEnv Registering MapOutputTracker INFO org.apache.spark.SparkEnv Registering BlockManagerMaster INFO o.a.spark.storage.DiskBlockManager Created local directory at /var/folders/x7/9hdp8kw9569864088tsl4jmm0000gn/T/spark-150e23b2-ff19-4a51-8cfc-25fb8e1b3f2b/blockmgr-6eea286c-7473-4bda-8886-7250156b68f4 INFO org.apache.spark.storage.MemoryStore MemoryStore started with capacity 1966.1 MB INFO org.apache.spark.HttpFileServer HTTP File server directory is /var/folders/x7/9hdp8kw9569864088tsl4jmm0000gn/T/spark-cf4687bd-1563-4ddf-b697-21c96fd95561/httpd-6343b9c9-bb66-43ac-ac43-6da80c7a1f95 INFO org.apache.spark.HttpServer Starting HTTP Server INFO o.spark-project.jetty.server.Server jetty-8.y.z-SNAPSHOT INFO o.s.jetty.server.AbstractConnector Started SocketConnector@0.0.0.0:62494 INFO org.apache.spark.util.Utils Successfully started service 'HTTP file server' on port 62494. INFO org.apache.spark.SparkEnv Registering OutputCommitCoordinator INFO o.spark-project.jetty.server.Server jetty-8.y.z-SNAPSHOT INFO o.s.jetty.server.AbstractConnector Started SelectChannelConnector@0.0.0.0:4040 INFO org.apache.spark.util.Utils Successfully started service 'SparkUI' on port 4040. INFO org.apache.spark.ui.SparkUI Started SparkUI at http://192.168.1.134:4040 INFO org.apache.spark.executor.Executor Starting executor ID <driver> on host localhost INFO org.apache.spark.util.AkkaUtils Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.1.134:62493/user/HeartbeatReceiver INFO o.a.s.n.n.NettyBlockTransferService Server created on 62495 INFO o.a.spark.storage.BlockManagerMaster Trying to register BlockManager INFO o.a.s.s.BlockManagerMasterActor Registering block manager localhost:62495 with 1966.1 MB RAM, BlockManagerId(<driver>, localhost, 62495) INFO o.a.spark.storage.BlockManagerMaster Registered BlockManager INFO o.a.h.conf.Configuration.deprecation mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize INFO o.a.h.conf.Configuration.deprecation mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative INFO o.a.h.conf.Configuration.deprecation mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node INFO o.a.h.conf.Configuration.deprecation mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces INFO o.a.h.conf.Configuration.deprecation mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive INFO o.a.h.hive.metastore.HiveMetaStore 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore INFO o.a.h.hive.metastore.ObjectStore ObjectStore, initialize called INFO DataNucleus.Persistence Property hive.metastore.integral.jdo.pushdown unknown - will be ignored INFO DataNucleus.Persistence Property datanucleus.cache.level2 unknown - will be ignored INFO o.a.h.hive.metastore.ObjectStore Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" INFO o.a.h.h.metastore.MetaStoreDirectSql MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Datastore The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. INFO DataNucleus.Query Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing INFO o.a.h.hive.metastore.ObjectStore Initialized ObjectStore INFO o.a.h.hive.metastore.HiveMetaStore Added admin role in metastore INFO o.a.h.hive.metastore.HiveMetaStore Added public role in metastore INFO o.a.h.hive.metastore.HiveMetaStore No user is added in admin role, since config is empty INFO o.a.h.hive.ql.session.SessionState No Tez session required at this point. hive.execution.engine=mr. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. root |-- clientMarketId: integer (nullable = true) |-- clientCountryId: integer (nullable = true) |-- clientRegionId: integer (nullable = true) |-- clientStateId: integer (nullable = true) |-- clientAsnId: integer (nullable = true) |-- reporterZoneId: integer (nullable = true) |-- reporterCustomerId: integer (nullable = true) |-- responseCode: integer (nullable = true) |-- measurementValue: integer (nullable = true) |-- year: integer (nullable = true) |-- month: integer (nullable = true) |-- day: integer (nullable = true) |-- providerOwnerZoneId: integer (nullable = true) |-- providerOwnerCustomerId: integer (nullable = true) |-- providerId: integer (nullable = true) |-- probeTypeId: integer (nullable = true) ====================================================== INFO hive.ql.parse.ParseDriver Parsing command: select probeTypeId from df where probeTypeId = 1 limit 1 INFO hive.ql.parse.ParseDriver Parse Completed ==== results for select probeTypeId from df where probeTypeId = 1 limit 1 ====================================================== INFO o.a.s.sql.parquet.ParquetRelation2 Reading 33.33333333333333% of partitions INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(191336) called with curMem=0, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_0 stored as values in memory (estimated size 186.9 KB, free 1966.0 MB) INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(27530) called with curMem=191336, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.9 KB, free 1965.9 MB) INFO o.a.spark.storage.BlockManagerInfo Added broadcast_0_piece0 in memory on localhost:62495 (size: 26.9 KB, free: 1966.1 MB) INFO o.a.spark.storage.BlockManagerMaster Updated info of block broadcast_0_piece0 INFO org.apache.spark.SparkContext Created broadcast 0 from NewHadoopRDD at newParquet.scala:447 INFO o.a.h.conf.Configuration.deprecation mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize INFO o.a.s.s.p.ParquetRelation2$$anon$1$$anon$2 Using Task Side Metadata Split Strategy INFO org.apache.spark.SparkContext Starting job: runJob at SparkPlan.scala:121 INFO o.a.spark.scheduler.DAGScheduler Got job 0 (runJob at SparkPlan.scala:121) with 1 output partitions (allowLocal=false) INFO o.a.spark.scheduler.DAGScheduler Final stage: Stage 0(runJob at SparkPlan.scala:121) INFO o.a.spark.scheduler.DAGScheduler Parents of final stage: List() INFO o.a.spark.scheduler.DAGScheduler Missing parents: List() INFO o.a.spark.scheduler.DAGScheduler Submitting Stage 0 (MapPartitionsRDD[3] at map at SparkPlan.scala:96), which has no missing parents INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(5512) called with curMem=218866, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_1 stored as values in memory (estimated size 5.4 KB, free 1965.9 MB) INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(3754) called with curMem=224378, maxMem=2061647216 INFO org.apache.spark.storage.MemoryStore Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.7 KB, free 1965.9 MB) INFO o.a.spark.storage.BlockManagerInfo Added broadcast_1_piece0 in memory on localhost:62495 (size: 3.7 KB, free: 1966.1 MB) INFO o.a.spark.storage.BlockManagerMaster Updated info of block broadcast_1_piece0 INFO org.apache.spark.SparkContext Created broadcast 1 from broadcast at DAGScheduler.scala:839 INFO o.a.spark.scheduler.DAGScheduler Submitting 1 missing tasks from Stage 0 (MapPartitionsRDD[3] at map at SparkPlan.scala:96) INFO o.a.s.scheduler.TaskSchedulerImpl Adding task set 0.0 with 1 tasks INFO o.a.spark.scheduler.TaskSetManager Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1687 bytes) INFO org.apache.spark.executor.Executor Running task 0.0 in stage 0.0 (TID 0) INFO o.a.s.s.p.ParquetRelation2$$anon$1 Input split: ParquetInputSplit{part: file:/Users/jon/Downloads/sparksql/1partitionsminusgeo/year=2015/month=1/day=14/providerOwnerZoneId=0/providerOwnerCustomerId=0/providerId=287/probeTypeId=1/part-r-00001.parquet start: 0 end: 8851183 length: 8851183 hosts: [] requestedSchema: message root { optional int32 probeTypeId; } readSupportMetadata: {org.apache.spark.sql.parquet.row.requested_schema={"type":"struct","fields":[{"name":"probeTypeId","type":"integer","nullable":true,"metadata":{}}]}, org.apache.spark.sql.parquet.row.metadata={"type":"struct","fields":[{"name":"clientMarketId","type":"integer","nullable":true,"metadata":{}},{"name":"clientCountryId","type":"integer","nullable":true,"metadata":{}},{"name":"clientRegionId","type":"integer","nullable":true,"metadata":{}},{"name":"clientStateId","type":"integer","nullable":true,"metadata":{}},{"name":"clientAsnId","type":"integer","nullable":true,"metadata":{}},{"name":"reporterZoneId","type":"integer","nullable":true,"metadata":{}},{"name":"reporterCustomerId","type":"integer","nullable":true,"metadata":{}},{"name":"responseCode","type":"integer","nullable":true,"metadata":{}},{"name":"measurementValue","type":"integer","nullable":true,"metadata":{}},{"name":"year","type":"integer","nullable":true,"metadata":{}},{"name":"month","type":"integer","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"providerOwnerZoneId","type":"integer","nullable":true,"metadata":{}},{"name":"providerOwnerCustomerId","type":"integer","nullable":true,"metadata":{}},{"name":"providerId","type":"integer","nullable":true,"metadata":{}},{"name":"probeTypeId","type":"integer","nullable":true,"metadata":{}}]}}} ERROR org.apache.spark.executor.Executor Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) ~[parquet-common-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) ~[parquet-column-1.6.0rc3.jar:na] at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) ~[parquet-hadoop-1.6.0rc3.jar:na] at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.scheduler.Task.run(Task.scala:64) ~[spark-core_2.10-1.3.0.jar:1.3.0] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) ~[spark-core_2.10-1.3.0.jar:1.3.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] WARN o.a.spark.scheduler.TaskSetManager Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ERROR o.a.spark.scheduler.TaskSetManager Task 0 in stage 0.0 failed 1 times; aborting job INFO o.a.s.scheduler.TaskSchedulerImpl Removed TaskSet 0.0, whose tasks have all completed, from pool INFO o.a.s.scheduler.TaskSchedulerImpl Cancelling stage 0 INFO o.a.spark.scheduler.DAGScheduler Job 0 failed: runJob at SparkPlan.scala:121, took 0.132538 s INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/metrics/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/static,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/executors,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/environment/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/environment,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/storage,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/pool,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/stage,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/stages,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs/job,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs/json,null} INFO o.s.j.server.handler.ContextHandler stopped o.s.j.s.ServletContextHandler{/jobs,null} INFO org.apache.spark.ui.SparkUI Stopped Spark web UI at http://192.168.1.134:4040 INFO o.a.spark.scheduler.DAGScheduler Stopping DAGScheduler INFO o.a.s.MapOutputTrackerMasterActor MapOutputTrackerActor stopped! INFO org.apache.spark.storage.MemoryStore MemoryStore cleared INFO o.apache.spark.storage.BlockManager BlockManager stopped INFO o.a.spark.storage.BlockManagerMaster BlockManagerMaster stopped INFO o.a.s.s.OutputCommitCoordinator$OutputCommitCoordinatorActor OutputCommitCoordinator stopped! INFO org.apache.spark.SparkContext Successfully stopped SparkContext org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not found in schema! at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) at parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) at parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) at parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) at parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) at parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Mar 26, 2015 6:06:02 AM INFO: parquet.filter2.compat.FilterCompat: Filtering using predicate: eq(probeTypeId, 1) Mar 26, 2015 6:06:02 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Mar 26, 2015 6:06:02 AM INFO: parquet.filter2.compat.FilterCompat: Filtering using predicate: eq(probeTypeId, 1) Process finished with exit code 255 > Cannot use partition columns in where clause > -------------------------------------------- > > Key: SPARK-6554 > URL: https://issues.apache.org/jira/browse/SPARK-6554 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0 > Reporter: Jon Chase > > I'm having trouble referencing partition columns in my queries with Parquet. > In the following example, 'probeTypeId' is a partition column. For example, > the directory structure looks like this: > {noformat} > /mydata > /probeTypeId=1 > ...files... > /probeTypeId=2 > ...files... > {noformat} > I see the column when I reference load a DF using the /mydata directory and > call df.printSchema(): > {noformat} > |-- probeTypeId: integer (nullable = true) > {noformat} > Parquet is also aware of the column: > {noformat} > optional int32 probeTypeId; > {noformat} > And this works fine: > {code} > sqlContext.sql("select probeTypeId from df limit 1"); > {code} > ...as does {{df.show()}} - it shows the correct values for the partition > column. > However, when I try to use a partition column in a where clause, I get an > exception stating that the column was not found in the schema: > {noformat} > sqlContext.sql("select probeTypeId from df where probeTypeId = 1 limit 1"); > ... > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] > was not found in schema! > at parquet.Preconditions.checkArgument(Preconditions.java:47) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) > at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) > at > parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) > at > parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) > at > parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) > ... > ... > {noformat} > Here's the full stack trace: > {noformat} > using local[*] for master > 06:05:55,675 |-INFO in > ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not > set > 06:05:55,683 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender] > 06:05:55,694 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > Naming appender as [STDOUT] > 06:05:55,721 |-INFO in > ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default > type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] > property > 06:05:55,768 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - > Setting level of ROOT logger to INFO > 06:05:55,768 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - > Attaching appender named [STDOUT] to Logger[ROOT] > 06:05:55,769 |-INFO in > ch.qos.logback.classic.joran.action.ConfigurationAction - End of > configuration. > 06:05:55,770 |-INFO in > ch.qos.logback.classic.joran.JoranConfigurator@6aaceffd - Registering current > configuration as safe fallback point > INFO org.apache.spark.SparkContext Running Spark version 1.3.0 > WARN o.a.hadoop.util.NativeCodeLoader Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > INFO org.apache.spark.SecurityManager Changing view acls to: jon > INFO org.apache.spark.SecurityManager Changing modify acls to: jon > INFO org.apache.spark.SecurityManager SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(jon); users with > modify permissions: Set(jon) > INFO akka.event.slf4j.Slf4jLogger Slf4jLogger started > INFO Remoting Starting remoting > INFO Remoting Remoting started; listening on addresses > :[akka.tcp://sparkDriver@192.168.1.134:62493] > INFO org.apache.spark.util.Utils Successfully started service 'sparkDriver' > on port 62493. > INFO org.apache.spark.SparkEnv Registering MapOutputTracker > INFO org.apache.spark.SparkEnv Registering BlockManagerMaster > INFO o.a.spark.storage.DiskBlockManager Created local directory at > /var/folders/x7/9hdp8kw9569864088tsl4jmm0000gn/T/spark-150e23b2-ff19-4a51-8cfc-25fb8e1b3f2b/blockmgr-6eea286c-7473-4bda-8886-7250156b68f4 > INFO org.apache.spark.storage.MemoryStore MemoryStore started with capacity > 1966.1 MB > INFO org.apache.spark.HttpFileServer HTTP File server directory is > /var/folders/x7/9hdp8kw9569864088tsl4jmm0000gn/T/spark-cf4687bd-1563-4ddf-b697-21c96fd95561/httpd-6343b9c9-bb66-43ac-ac43-6da80c7a1f95 > INFO org.apache.spark.HttpServer Starting HTTP Server > INFO o.spark-project.jetty.server.Server jetty-8.y.z-SNAPSHOT > INFO o.s.jetty.server.AbstractConnector Started SocketConnector@0.0.0.0:62494 > INFO org.apache.spark.util.Utils Successfully started service 'HTTP file > server' on port 62494. > INFO org.apache.spark.SparkEnv Registering OutputCommitCoordinator > INFO o.spark-project.jetty.server.Server jetty-8.y.z-SNAPSHOT > INFO o.s.jetty.server.AbstractConnector Started > SelectChannelConnector@0.0.0.0:4040 > INFO org.apache.spark.util.Utils Successfully started service 'SparkUI' on > port 4040. > INFO org.apache.spark.ui.SparkUI Started SparkUI at http://192.168.1.134:4040 > INFO org.apache.spark.executor.Executor Starting executor ID <driver> on > host localhost > INFO org.apache.spark.util.AkkaUtils Connecting to HeartbeatReceiver: > akka.tcp://sparkDriver@192.168.1.134:62493/user/HeartbeatReceiver > INFO o.a.s.n.n.NettyBlockTransferService Server created on 62495 > INFO o.a.spark.storage.BlockManagerMaster Trying to register BlockManager > INFO o.a.s.s.BlockManagerMasterActor Registering block manager > localhost:62495 with 1966.1 MB RAM, BlockManagerId(<driver>, localhost, 62495) > INFO o.a.spark.storage.BlockManagerMaster Registered BlockManager > INFO o.a.h.conf.Configuration.deprecation mapred.max.split.size is > deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize > INFO o.a.h.conf.Configuration.deprecation > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use > mapreduce.reduce.speculative > INFO o.a.h.conf.Configuration.deprecation > mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use > mapreduce.job.committer.setup.cleanup.needed > INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size.per.rack is > deprecated. Instead, use > mapreduce.input.fileinputformat.split.minsize.per.rack > INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size is > deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize > INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size.per.node is > deprecated. Instead, use > mapreduce.input.fileinputformat.split.minsize.per.node > INFO o.a.h.conf.Configuration.deprecation mapred.reduce.tasks is deprecated. > Instead, use mapreduce.job.reduces > INFO o.a.h.conf.Configuration.deprecation mapred.input.dir.recursive is > deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive > INFO o.a.h.hive.metastore.HiveMetaStore 0: Opening raw store with > implemenation class:org.apache.hadoop.hive.metastore.ObjectStore > INFO o.a.h.hive.metastore.ObjectStore ObjectStore, initialize called > INFO DataNucleus.Persistence Property hive.metastore.integral.jdo.pushdown > unknown - will be ignored > INFO DataNucleus.Persistence Property datanucleus.cache.level2 unknown - > will be ignored > INFO o.a.h.hive.metastore.ObjectStore Setting MetaStore object pin classes > with > hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" > INFO o.a.h.h.metastore.MetaStoreDirectSql MySQL check failed, assuming we > are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), > after : "". > INFO DataNucleus.Datastore The class > "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as > "embedded-only" so does not have its own datastore table. > INFO DataNucleus.Datastore The class > "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" > so does not have its own datastore table. > INFO DataNucleus.Datastore The class > "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as > "embedded-only" so does not have its own datastore table. > INFO DataNucleus.Datastore The class > "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" > so does not have its own datastore table. > INFO DataNucleus.Query Reading in results for query > "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is > closing > INFO o.a.h.hive.metastore.ObjectStore Initialized ObjectStore > INFO o.a.h.hive.metastore.HiveMetaStore Added admin role in metastore > INFO o.a.h.hive.metastore.HiveMetaStore Added public role in metastore > INFO o.a.h.hive.metastore.HiveMetaStore No user is added in admin role, > since config is empty > INFO o.a.h.hive.ql.session.SessionState No Tez session required at this > point. hive.execution.engine=mr. > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > root > |-- clientMarketId: integer (nullable = true) > |-- clientCountryId: integer (nullable = true) > |-- clientRegionId: integer (nullable = true) > |-- clientStateId: integer (nullable = true) > |-- clientAsnId: integer (nullable = true) > |-- reporterZoneId: integer (nullable = true) > |-- reporterCustomerId: integer (nullable = true) > |-- responseCode: integer (nullable = true) > |-- measurementValue: integer (nullable = true) > |-- year: integer (nullable = true) > |-- month: integer (nullable = true) > |-- day: integer (nullable = true) > |-- providerOwnerZoneId: integer (nullable = true) > |-- providerOwnerCustomerId: integer (nullable = true) > |-- providerId: integer (nullable = true) > |-- probeTypeId: integer (nullable = true) > ====================================================== > INFO hive.ql.parse.ParseDriver Parsing command: select probeTypeId from df > where probeTypeId = 1 limit 1 > INFO hive.ql.parse.ParseDriver Parse Completed > ==== results for select probeTypeId from df where probeTypeId = 1 limit 1 > ====================================================== > INFO o.a.s.sql.parquet.ParquetRelation2 Reading 33.33333333333333% of > partitions > INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(191336) called > with curMem=0, maxMem=2061647216 > INFO org.apache.spark.storage.MemoryStore Block broadcast_0 stored as values > in memory (estimated size 186.9 KB, free 1966.0 MB) > INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(27530) called with > curMem=191336, maxMem=2061647216 > INFO org.apache.spark.storage.MemoryStore Block broadcast_0_piece0 stored as > bytes in memory (estimated size 26.9 KB, free 1965.9 MB) > INFO o.a.spark.storage.BlockManagerInfo Added broadcast_0_piece0 in memory > on localhost:62495 (size: 26.9 KB, free: 1966.1 MB) > INFO o.a.spark.storage.BlockManagerMaster Updated info of block > broadcast_0_piece0 > INFO org.apache.spark.SparkContext Created broadcast 0 from NewHadoopRDD at > newParquet.scala:447 > INFO o.a.h.conf.Configuration.deprecation mapred.max.split.size is > deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize > INFO o.a.h.conf.Configuration.deprecation mapred.min.split.size is > deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize > INFO o.a.s.s.p.ParquetRelation2$$anon$1$$anon$2 Using Task Side Metadata > Split Strategy > INFO org.apache.spark.SparkContext Starting job: runJob at > SparkPlan.scala:121 > INFO o.a.spark.scheduler.DAGScheduler Got job 0 (runJob at > SparkPlan.scala:121) with 1 output partitions (allowLocal=false) > INFO o.a.spark.scheduler.DAGScheduler Final stage: Stage 0(runJob at > SparkPlan.scala:121) > INFO o.a.spark.scheduler.DAGScheduler Parents of final stage: List() > INFO o.a.spark.scheduler.DAGScheduler Missing parents: List() > INFO o.a.spark.scheduler.DAGScheduler Submitting Stage 0 > (MapPartitionsRDD[3] at map at SparkPlan.scala:96), which has no missing > parents > INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(5512) called with > curMem=218866, maxMem=2061647216 > INFO org.apache.spark.storage.MemoryStore Block broadcast_1 stored as values > in memory (estimated size 5.4 KB, free 1965.9 MB) > INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(3754) called with > curMem=224378, maxMem=2061647216 > INFO org.apache.spark.storage.MemoryStore Block broadcast_1_piece0 stored as > bytes in memory (estimated size 3.7 KB, free 1965.9 MB) > INFO o.a.spark.storage.BlockManagerInfo Added broadcast_1_piece0 in memory > on localhost:62495 (size: 3.7 KB, free: 1966.1 MB) > INFO o.a.spark.storage.BlockManagerMaster Updated info of block > broadcast_1_piece0 > INFO org.apache.spark.SparkContext Created broadcast 1 from broadcast at > DAGScheduler.scala:839 > INFO o.a.spark.scheduler.DAGScheduler Submitting 1 missing tasks from Stage > 0 (MapPartitionsRDD[3] at map at SparkPlan.scala:96) > INFO o.a.s.scheduler.TaskSchedulerImpl Adding task set 0.0 with 1 tasks > INFO o.a.spark.scheduler.TaskSetManager Starting task 0.0 in stage 0.0 (TID > 0, localhost, PROCESS_LOCAL, 1687 bytes) > INFO org.apache.spark.executor.Executor Running task 0.0 in stage 0.0 (TID 0) > INFO o.a.s.s.p.ParquetRelation2$$anon$1 Input split: ParquetInputSplit{part: > file:/Users/jon/Downloads/sparksql/1partitionsminusgeo/year=2015/month=1/day=14/providerOwnerZoneId=0/providerOwnerCustomerId=0/providerId=287/probeTypeId=1/part-r-00001.parquet > start: 0 end: 8851183 length: 8851183 hosts: [] requestedSchema: message > root { > optional int32 probeTypeId; > } > readSupportMetadata: > {org.apache.spark.sql.parquet.row.requested_schema={"type":"struct","fields":[{"name":"probeTypeId","type":"integer","nullable":true,"metadata":{}}]}, > > org.apache.spark.sql.parquet.row.metadata={"type":"struct","fields":[{"name":"clientMarketId","type":"integer","nullable":true,"metadata":{}},{"name":"clientCountryId","type":"integer","nullable":true,"metadata":{}},{"name":"clientRegionId","type":"integer","nullable":true,"metadata":{}},{"name":"clientStateId","type":"integer","nullable":true,"metadata":{}},{"name":"clientAsnId","type":"integer","nullable":true,"metadata":{}},{"name":"reporterZoneId","type":"integer","nullable":true,"metadata":{}},{"name":"reporterCustomerId","type":"integer","nullable":true,"metadata":{}},{"name":"responseCode","type":"integer","nullable":true,"metadata":{}},{"name":"measurementValue","type":"integer","nullable":true,"metadata":{}},{"name":"year","type":"integer","nullable":true,"metadata":{}},{"name":"month","type":"integer","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"providerOwnerZoneId","type":"integer","nullable":true,"metadata":{}},{"name":"providerOwnerCustomerId","type":"integer","nullable":true,"metadata":{}},{"name":"providerId","type":"integer","nullable":true,"metadata":{}},{"name":"probeTypeId","type":"integer","nullable":true,"metadata":{}}]}}} > ERROR org.apache.spark.executor.Executor Exception in task 0.0 in stage 0.0 > (TID 0) > java.lang.IllegalArgumentException: Column [probeTypeId] was not found in > schema! > at parquet.Preconditions.checkArgument(Preconditions.java:47) > ~[parquet-common-1.6.0rc3.jar:na] > at > parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) > ~[parquet-column-1.6.0rc3.jar:na] > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) > ~[parquet-column-1.6.0rc3.jar:na] > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) > ~[parquet-column-1.6.0rc3.jar:na] > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) > ~[parquet-column-1.6.0rc3.jar:na] > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) > ~[parquet-column-1.6.0rc3.jar:na] > at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) > ~[parquet-column-1.6.0rc3.jar:na] > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) > ~[parquet-column-1.6.0rc3.jar:na] > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) > ~[parquet-hadoop-1.6.0rc3.jar:na] > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) > ~[parquet-hadoop-1.6.0rc3.jar:na] > at > parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) > ~[parquet-column-1.6.0rc3.jar:na] > at > parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) > ~[parquet-hadoop-1.6.0rc3.jar:na] > at > parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) > ~[parquet-hadoop-1.6.0rc3.jar:na] > at > parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) > ~[parquet-hadoop-1.6.0rc3.jar:na] > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at > org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at org.apache.spark.scheduler.Task.run(Task.scala:64) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > ~[spark-core_2.10-1.3.0.jar:1.3.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_31] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_31] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] > WARN o.a.spark.scheduler.TaskSetManager Lost task 0.0 in stage 0.0 (TID 0, > localhost): java.lang.IllegalArgumentException: Column [probeTypeId] was not > found in schema! > at parquet.Preconditions.checkArgument(Preconditions.java:47) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) > at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) > at > parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) > at > parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) > at > parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) > at > parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > ERROR o.a.spark.scheduler.TaskSetManager Task 0 in stage 0.0 failed 1 times; > aborting job > INFO o.a.s.scheduler.TaskSchedulerImpl Removed TaskSet 0.0, whose tasks have > all completed, from pool > INFO o.a.s.scheduler.TaskSchedulerImpl Cancelling stage 0 > INFO o.a.spark.scheduler.DAGScheduler Job 0 failed: runJob at > SparkPlan.scala:121, took 0.132538 s > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/static,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/executors,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/environment,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/storage,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/stages,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/jobs/job,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/jobs/json,null} > INFO o.s.j.server.handler.ContextHandler stopped > o.s.j.s.ServletContextHandler{/jobs,null} > INFO org.apache.spark.ui.SparkUI Stopped Spark web UI at > http://192.168.1.134:4040 > INFO o.a.spark.scheduler.DAGScheduler Stopping DAGScheduler > INFO o.a.s.MapOutputTrackerMasterActor MapOutputTrackerActor stopped! > INFO org.apache.spark.storage.MemoryStore MemoryStore cleared > INFO o.apache.spark.storage.BlockManager BlockManager stopped > INFO o.a.spark.storage.BlockManagerMaster BlockManagerMaster stopped > INFO o.a.s.s.OutputCommitCoordinator$OutputCommitCoordinatorActor > OutputCommitCoordinator stopped! > INFO org.apache.spark.SparkContext Successfully stopped SparkContext > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0, localhost): java.lang.IllegalArgumentException: Column [probeTypeId] > was not found in schema! > at parquet.Preconditions.checkArgument(Preconditions.java:47) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:172) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:160) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:142) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:76) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:41) > at parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162) > at > parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:46) > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:41) > at parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22) > at > parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108) > at > parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28) > at > parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158) > at > parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.NewHadoopRDD$NewHadoopMapPartitionsWithSplitRDD.compute(NewHadoopRDD.scala:244) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > Mar 26, 2015 6:06:02 AM INFO: parquet.filter2.compat.FilterCompat: Filtering > using predicate: eq(probeTypeId, 1) > Mar 26, 2015 6:06:02 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not > initialize counter due to context is not a instance of > TaskInputOutputContext, but is > org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl > Mar 26, 2015 6:06:02 AM INFO: parquet.filter2.compat.FilterCompat: Filtering > using predicate: eq(probeTypeId, 1) > Process finished with exit code 255 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org