I see that somebody had already raised a PR for this but it hasn't been merged.
https://issues.apache.org/jira/browse/SPARK-4339 Can we merge this in next 1.2 RC? Thanks -Nitin On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal <nitin2go...@gmail.com> wrote: > Hi Michael, > > I think I have found the exact problem in my case. I see that we have > written something like following in Analyzer.scala :- > > // TODO: pass this in as a parameter. > > val fixedPoint = FixedPoint(100) > > > and > > > Batch("Resolution", fixedPoint, > > ResolveReferences :: > > ResolveRelations :: > > ResolveSortReferences :: > > NewRelationInstances :: > > ImplicitGenerate :: > > StarExpansion :: > > ResolveFunctions :: > > GlobalAggregates :: > > UnresolvedHavingClauseAttributes :: > > TrimGroupingAliases :: > > typeCoercionRules ++ > > extendedRules : _*), > > Perhaps in my case, it reaches the 100 iterations and break out of while > loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes. > > Exception in my logs :- > > 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached > for batch Resolution > > 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in > context with path [] threw exception [Servlet execution threw an exception] > with root cause > > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved > attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS > DOWN_BYTESHTTPSUBCR#6567, tree: > > 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS > DOWN_BYTESHTTPSUBCR#6567] > > ... > > ... > > ... > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) > > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) > > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) > > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) > > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) > > at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) > > at scala.collection.immutable.List.foreach(List.scala:318) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) > > at > org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) > > at > org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) > > at > org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86) > > at org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67) > > at > org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85) > > at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50) > > at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490) > > > I think the solution here is to have the FixedPoint constructor argument > as configurable/parameterized (also written as TODO). Do we have a plan to > do this in 1.2 release? Or I can take this up as a task for myself if you > want (since this is very crucial for our release). > > > Thanks > > -Nitin > > On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust <mich...@databricks.com> > wrote: > >> val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, >>> existingSchemaRDD.schema) >>> >> >> This line is throwing away the logical information about >> existingSchemaRDD and thus Spark SQL can't know how to push down >> projections or predicates past this operator. >> >> Can you describe more the problems that you see if you don't do this >> reapplication of the schema. >> > > > > -- > Regards > Nitin Goyal > -- Regards Nitin Goyal