Hi Michael, I think I have found the exact problem in my case. I see that we have written something like following in Analyzer.scala :-
// TODO: pass this in as a parameter. val fixedPoint = FixedPoint(100) and Batch("Resolution", fixedPoint, ResolveReferences :: ResolveRelations :: ResolveSortReferences :: NewRelationInstances :: ImplicitGenerate :: StarExpansion :: ResolveFunctions :: GlobalAggregates :: UnresolvedHavingClauseAttributes :: TrimGroupingAliases :: typeCoercionRules ++ extendedRules : _*), Perhaps in my case, it reaches the 100 iterations and break out of while loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes. Exception in my logs :- 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached for batch Resolution 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in context with path [] threw exception [Servlet execution threw an exception] with root cause org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS DOWN_BYTESHTTPSUBCR#6567, tree: 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS DOWN_BYTESHTTPSUBCR#6567] ... ... ... at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) at org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86) at org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67) at org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85) at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50) at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490) I think the solution here is to have the FixedPoint constructor argument as configurable/parameterized (also written as TODO). Do we have a plan to do this in 1.2 release? Or I can take this up as a task for myself if you want (since this is very crucial for our release). Thanks -Nitin On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust <mich...@databricks.com> wrote: > val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, >> existingSchemaRDD.schema) >> > > This line is throwing away the logical information about existingSchemaRDD > and thus Spark SQL can't know how to push down projections or predicates > past this operator. > > Can you describe more the problems that you see if you don't do this > reapplication of the schema. > -- Regards Nitin Goyal