[ https://issues.apache.org/jira/browse/SPARK-29029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-29029. ----------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25761 [https://github.com/apache/spark/pull/25761] > PhysicalOperation.collectProjectsAndFilters should use AttributeMap while > substituting aliases > ---------------------------------------------------------------------------------------------- > > Key: SPARK-29029 > URL: https://issues.apache.org/jira/browse/SPARK-29029 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 2.3.0 > Reporter: Nikita Konda > Assignee: Nikita Konda > Priority: Major > Fix For: 3.0.0 > > > We have a specific use case where in we are trying insert a custom logical > operator in our logical plan to avoid some of the Spark’s optimization rules. > However, we remove this logical operator as part of custom optimization rule > before we send this to SparkStrategies. > However, we are hitting issue in the following scenario: > Analyzed plan: > {code:java} > [1] Project [userid#0] > +- [2] SubqueryAlias tmp6 > +- [3] Project [videoid#47L, avebitrate#2, userid#0] > +- [4] Filter NOT (videoid#47L = cast(30 as bigint)) > +- [5] SubqueryAlias tmp5 > +- [6] CustomBarrier > +- [7] Project [videoid#47L, avebitrate#2, userid#0] > +- [8] Filter (avebitrate#2 < 10) > +- [9] SubqueryAlias tmp3 > +- [10] Project [avebitrate#2, factorial(videoid#1) > AS videoid#47L, userid#0] > +- [11] SubqueryAlias tmp2 > +- [12] Project [userid#0, videoid#1, > avebitrate#2] > +- [13] SubqueryAlias tmp1 > +- [14] Project [userid#0, videoid#1, > avebitrate#2] > +- [15] SubqueryAlias views > +- [16] > Relation[userid#0,videoid#1,avebitrate#2] > {code} > > Optimized Plan: > {code:java} > [1] Project [userid#0] > +- [2] Filter (isnotnull(videoid#47L) && NOT (videoid#47L = 30)) > +- [3] Project [factorial(videoid#1) AS videoid#47L, userid#0] > +- [4] Filter (isnotnull(avebitrate#2) && (avebitrate#2 < 10)) > +- [5] Relation[userid#0,videoid#1,avebitrate#2] > {code} > > When this plan is passed into *PhysicalOperation* in *DataSourceStrategy*, > the collectProjectsAndFilters collects filters as > List[[+AttributeReference("videoid#47L"), > AttributeReference("avebitrate#2")]+|#47L), > AttributeReference(avebitrate#2)]. However, at this stage the base relation > only has videoid#1 and hence it throws exception saying *key not found: > videoid#47L.* > On looking further, noticed that the alias map in > *PhysicalOperation.substitute* does have the entry with key *videoid#47L* -> > Aliases Map((videoid#47L, factorial(videoid#1))). However, the substitute > alias is not substituting the expression for alias videoid#47L because they > differ in qualifier parameter. > Attribute key in Alias: AttributeReference("videoid", LongType, nullable = > true)(ExprId(47, _), *"None"*) > Attribute in Filter condition: AttributeReference("videoid", LongType, > nullable = true)(ExprId(47, _), *"Some(tmp5)"*) > Both differ only in the qualifier, however for alias map if we use > AttributeMap instead of Map[Attribute, Expression], we can get rid of the > above issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org