Re: PhysicalRDD problem?

Michael Armbrust Wed, 10 Dec 2014 10:32:02 -0800

I'm hesitant to merge that PR in as it is using a brand new configuration
path that is different from the way that the rest of Spark SQL / Spark are
configured.


I'm suspicious that that hitting max iterations is emblematic of some other
issue, as typically resolution happens bottom up, in a single pass.  Can
you provide more details about your schema / query?

On Tue, Dec 9, 2014 at 11:43 PM, Nitin Goyal <nitin2go...@gmail.com> wrote:

> I see that somebody had already raised a PR for this but it hasn't been
> merged.
>
> https://issues.apache.org/jira/browse/SPARK-4339
>
> Can we merge this in next 1.2 RC?
>
> Thanks
> -Nitin
>
>
> On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal <nitin2go...@gmail.com>
> wrote:
>
>> Hi Michael,
>>
>> I think I have found the exact problem in my case. I see that we have
>> written something like following in Analyzer.scala :-
>>
>>   // TODO: pass this in as a parameter.
>>
>>   val fixedPoint = FixedPoint(100)
>>
>>
>> and
>>
>>
>>     Batch("Resolution", fixedPoint,
>>
>>       ResolveReferences ::
>>
>>       ResolveRelations ::
>>
>>       ResolveSortReferences ::
>>
>>       NewRelationInstances ::
>>
>>       ImplicitGenerate ::
>>
>>       StarExpansion ::
>>
>>       ResolveFunctions ::
>>
>>       GlobalAggregates ::
>>
>>       UnresolvedHavingClauseAttributes ::
>>
>>       TrimGroupingAliases ::
>>
>>       typeCoercionRules ++
>>
>>       extendedRules : _*),
>>
>> Perhaps in my case, it reaches the 100 iterations and break out of while
>> loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes.
>>
>> Exception in my logs :-
>>
>> 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached
>> for batch Resolution
>>
>> 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in
>> context with path [] threw exception [Servlet execution threw an exception]
>> with root cause
>>
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>> Unresolved attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS
>> DOWN_BYTESHTTPSUBCR#6567, tree:
>>
>> 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS
>> DOWN_BYTESHTTPSUBCR#6567]
>>
>> ...
>>
>> ...
>>
>> ...
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>
>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>
>> at scala.collection.immutable.List.foreach(List.scala:318)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
>>
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
>>
>> at
>> org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86)
>>
>> at
>> org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67)
>>
>> at
>> org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85)
>>
>> at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50)
>>
>>  at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490)
>>
>>
>> I think the solution here is to have the FixedPoint constructor argument
>> as configurable/parameterized (also written as TODO). Do we have a plan to
>> do this in 1.2 release? Or I can take this up as a task for myself if you
>> want (since this is very crucial for our release).
>>
>>
>> Thanks
>>
>> -Nitin
>>
>> On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD,
>>>> existingSchemaRDD.schema)
>>>>
>>>
>>> This line is throwing away the logical information about
>>> existingSchemaRDD and thus Spark SQL can't know how to push down
>>> projections or predicates past this operator.
>>>
>>> Can you describe more the problems that you see if you don't do this
>>> reapplication of the schema.
>>>
>>
>>
>>
>> --
>> Regards
>> Nitin Goyal
>>
>
>
>
> --
> Regards
> Nitin Goyal
>

Re: PhysicalRDD problem?

Reply via email to