[ https://issues.apache.org/jira/browse/SPARK-32284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gengliang Wang updated SPARK-32284: ----------------------------------- Summary: Avoid expanding too many CNF predicates in partition pruning (was: Avoid pushing down too many CNF filters for partition pruning) > Avoid expanding too many CNF predicates in partition pruning > ------------------------------------------------------------ > > Key: SPARK-32284 > URL: https://issues.apache.org/jira/browse/SPARK-32284 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Gengliang Wang > Assignee: Gengliang Wang > Priority: Major > > After https://github.com/apache/spark/pull/28805, the pushed down predicates > for partition pruning can be very long. > For example, the following partition filter: > {code:java} > (p0 = '1' AND p1 = '1') OR (p0 = '2' AND p1 = '2') OR (p0 = '3' AND p1 = '3') > OR (p0 = '4' AND p1 = '4') OR (p0 = '5' AND p1 = '5') OR (p0 = '6' AND p1 = > '6') OR (p0 = '7' AND p1 = '7') OR (p0 = '8' AND p1 = '8') OR (p0 = '9' AND > p1 = '9') OR (p0 = '10' AND p1 = '10') OR (p0 = '11' AND p1 = '11') OR (p0 = > '12' AND p1 = '12') OR (p0 = '13' AND p1 = '13') OR (p0 = '14' AND p1 = '14') > OR (p0 = '15' AND p1 = '15') OR (p0 = '16' AND p1 = '16') OR (p0 = '17' AND > p1 = '17') OR (p0 = '18' AND p1 = '18') OR (p0 = '19' AND p1 = '19') OR (p0 = > '20' AND p1 = '20') > {code} > will be converted into a 130K long query in Hive metastore, and there will be > error: > {code:java} > javax.jdo.JDOException: Exception thrown when executing query : SELECT > DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS > NUCLEUS_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.PART_NAME,A0.PART_ID,A0.PART_NAME > AS NUCORDER0 FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = > B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = ? > AND C0."NAME" = ? AND ((((((A0.PART_NAME LIKE '%/p1=1' ESCAPE '\' ) OR > (A0.PART_NAME LIKE '%/p1=2' ESCAPE '\' )) OR (A0.PART_NAME LIKE '%/p1=3' > ESCAPE '\' )) OR ((A0.PART_NAME LIKE '%/p1=4' ESCAPE '\' ) O ... > {code} > To avoid it: > 1. We should push down the convertible original queries as they are, instead > of converting all predicates into CNF > 2. We can skip grouping expression so that we can stop the CNF conversion > when the predicates becoming too long. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org