[ https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362414#comment-16362414 ]
John Leach commented on PHOENIX-4602: ------------------------------------- [~comnetwork] I am new to Phoenix, but when I look at the WhereOptimizer.java it is not clear to me how or when the predicates are moved to Conjunctive Normal Form ([https://en.wikipedia.org/wiki/Conjunctive_normal_form).] I have always seen the following process when dealing with predicates. 1. Move to Conjunctive Normal Form all predicates. 2. Mark predicates on the key. 3. Apply a function on the key predicates to assemble a set of scans. 4. Apply a function on the remaining predicates to assemble a filter (Usually in list is an exception case). Do you know where CNF occurs? > OrExpression should can also push non-leading pk columns to scan > ---------------------------------------------------------------- > > Key: PHOENIX-4602 > URL: https://issues.apache.org/jira/browse/PHOENIX-4602 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.13.0 > Reporter: chenglei > Priority: Major > Attachments: PHOENIX-4602_v1.patch > > > Given following table: > {code} > CREATE TABLE test_table ( > PK1 INTEGER NOT NULL, > PK2 INTEGER NOT NULL, > PK3 INTEGER NOT NULL, > DATA INTEGER, > CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3)) > {code} > and a sql: > {code} > select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 > and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9)) > {code} > Obviously, it is a typical case for the sql to use SkipScanFilter,however, > the sql actually does not use Skip Scan, it use Range Scan and just push the > leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain > sql is : > {code:sql} > CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5] > SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9)) > {code} > I think the problem is affected by the > {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following > line 763, because the pk2 column is not the leading pk column,so this method > return null, causing the expression > {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not pushed > to scan: > {code:java} > 757 boolean hasFirstSlot = true; > 758 boolean prevIsNull = false; > 759 // TODO: Do the same optimization that we do for IN if the childSlots > specify a fully qualified row key > 760 for (KeySlot slot : childSlot) { > 761 if (hasFirstSlot) { > 762 // if the first slot is null, return null immediately > 763 if (slot == null) { > 764 return null; > 765 } > 766 // mark that we've handled the first slot > 767 hasFirstSlot = false; > 768 } > {code} > For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems > that it is not necessary to make sure the PK Column in OrExpression is > leading PK Column,just guarantee there is only one PK Column in OrExpression > is enough. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)