[ https://issues.apache.org/jira/browse/PHOENIX-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Esserlieu updated PHOENIX-3301: ------------------------------------- Description: Why does adding an ORDER BY change the number of records returned when there is a secondary index? Why without including an ORDER by are no records returned (seems to be hitting the base table and not the index)? I've included repro steps below: *Repro* {code:title=repro.sql|borderStyle=solid} DROP INDEX IF EXISTS TEST_INDEX ON TEST_TABLE; DROP TABLE IF EXISTS TEST_TABLE; CREATE TABLE IF NOT EXISTS TEST_TABLE ( PK1 CHAR(15) NOT NULL, PK2 DATE NOT NULL CONSTRAINT PK PRIMARY KEY ( PK1, PK2 DESC ) ); CREATE INDEX IF NOT EXISTS TEST_INDEX ON TEST_TABLE (PK2, PK1); UPSERT INTO TEST_TABLE (PK1, PK2) VALUES ('abc123',TO_DATE('2010-01-01T00:00:01Z')); UPSERT INTO TEST_TABLE (PK1, PK2) VALUES ('abc123',TO_DATE('2010-01-01T00:00:02Z')); UPSERT INTO TEST_TABLE (PK1, PK2) VALUES ('abc123',TO_DATE('2010-01-01T00:00:03Z')); -- Selects --This select statement returns 0 rows, though it should be returning the 3 upserted rows from above SELECT PK1, PK2 FROM TEST_TABLE WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) -- the EXPLAIN shows the base table is being hit. Why is the base table hit with this RVC? explain SELECT PK1, PK2 FROM TEST_TABLE WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) --Note: by adding an ORDER BY statement, the query returns all 3 rows SELECT PK1, PK2 FROM TEST_TABLE WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) ORDER BY PK2, PK1; -- the EXPLAIN shows the secondary index is now being used by this query explain SELECT PK1, PK2 FROM TEST_TABLE WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) ORDER BY PK2, PK1; {code} was: I was trying to run a delete statement using a row value constructor against a table's secondary index, and noticed no data was being deleted. Digging into the problem a bit more, I cant get any queries using a row value constructor against the secondary index to work at all. I've included repro steps below: *Repro* {code:title=repro.sql|borderStyle=solid} DROP INDEX IF EXISTS TEST_INDEX ON TEST_TABLE; DROP TABLE IF EXISTS TEST_TABLE; CREATE TABLE IF NOT EXISTS TEST_TABLE ( pk1 VARCHAR NOT NULL, pk2 VARCHAR NOT NULL, pk3 VARCHAR NOT NULL, pk4 DATE NOT NULL, pk5 VARCHAR NOT NULL, v1 VARCHAR CONSTRAINT PK PRIMARY KEY ( pk1, pk2, pk3, pk4 DESC, pk5 ) ) MULTI_TENANT=true,IMMUTABLE_ROWS=true; CREATE INDEX TEST_INDEX ON TEST_TABLE (pk2, pk4, pk3, pk5); upsert into TEST_TABLE (pk1, pk2, pk3, pk4, pk5, v1) values ('a', '001', '1', TO_DATE('2000-01-01'), 'A', 'value'); upsert into TEST_TABLE (pk1, pk2, pk3, pk4, pk5, v1) values ('a', '001', '1', TO_DATE('2000-01-01'), 'B', 'value'); -- THIS IS THIE ORDERING USING A DEFAULT ROW VALUE CONSTRUCTOR, USING -- VALUES FOR THE ROW VALUE CONSTRUCTOR SUCH THAT BOTH TEST ROWS -- SHOULD BE RETURNED SELECT * FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk3, pk4, pk5) >= ('a', '001', '0', TO_DATE('1999-01-01'), 'A') -- HERE ARE THE EXACT SAME VALUES, WITH FIELDS pk3 AND pk4 TRANSPOSED SELECT * FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk4, pk3, pk5) >= ('a', '001', TO_DATE('1999-01-01'), '0', 'A') -- NOTE, THAT NOTHING IS RETURNED BY THIS QUERY, BUT THE TWO TEST ROWS SHOULD BE. -- THIS SHOULD BE USING THE INDEX, BUT ISN'T: EXPLAIN SELECT * FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk4, pk3, pk5) >= ('a', '001', TO_DATE('1999-01-01'), '0', 'A') -- IF WE CHANGE THE INEQUALITY TO BE LESS THAN OR EQUALS WITH THE TRANSPOSED ROW -- KEY, NOTE THAT THE TEST ROWS ARE RETURNED (THEY SHOULDN'T BE): SELECT * FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk4, pk3, pk5) <= ('a', '001', TO_DATE('1999-01-01'), '0', 'A') -- HINTING TO USE AN INDEX DOESN'T WORK: SELECT /*+ INDEX(TEST_TABLE TEST_INDEX) */ * FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk4, pk3, pk5) >= ('a', '001', TO_DATE('1999-01-01'), '0', 'A') -- FORCING AN ORDERING THAT WOULD CAUSE THE INDEX TO BE USED DOESN'T WORK EITHER: SELECT * FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk4, pk3, pk5) >= ('a', '001', TO_DATE('1999-01-01'), '0', 'A') ORDER BY pk1, pk2, pk4, pk3, pk5 -- THIS IS THE ORIGINAL DELETE STATEMENT I TRIED TO RUN, BUT IS FAILING -- LIKELY FOR THE SAME REASON AS SELECTS ARE FAILING ABOVE DELETE FROM TEST_TABLE WHERE pk1 = 'a' AND pk2 = '001' AND (pk1, pk2, pk4, pk3, pk5) >= ('a', '001', TO_DATE('1999-01-01'), '0', 'A') {code} > Row Value Constructors Against Indexes Don't Work Correctly > ----------------------------------------------------------- > > Key: PHOENIX-3301 > URL: https://issues.apache.org/jira/browse/PHOENIX-3301 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.8.0 > Reporter: Brian Esserlieu > > Why does adding an ORDER BY change the number of records returned when there > is a secondary index? > Why without including an ORDER by are no records returned (seems to be > hitting the base table and not the index)? > I've included repro steps below: > *Repro* > {code:title=repro.sql|borderStyle=solid} > DROP INDEX IF EXISTS TEST_INDEX ON TEST_TABLE; > DROP TABLE IF EXISTS TEST_TABLE; > CREATE TABLE IF NOT EXISTS TEST_TABLE ( > PK1 CHAR(15) NOT NULL, > PK2 DATE NOT NULL > CONSTRAINT PK PRIMARY KEY > ( > PK1, > PK2 DESC > ) > ); > CREATE INDEX IF NOT EXISTS TEST_INDEX ON TEST_TABLE (PK2, PK1); > UPSERT INTO TEST_TABLE (PK1, PK2) > VALUES ('abc123',TO_DATE('2010-01-01T00:00:01Z')); > UPSERT INTO TEST_TABLE (PK1, PK2) > VALUES ('abc123',TO_DATE('2010-01-01T00:00:02Z')); > UPSERT INTO TEST_TABLE (PK1, PK2) > VALUES ('abc123',TO_DATE('2010-01-01T00:00:03Z')); > -- Selects > --This select statement returns 0 rows, though it should be returning the 3 > upserted rows from above > SELECT PK1, PK2 > FROM TEST_TABLE > WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) > -- the EXPLAIN shows the base table is being hit. Why is the base table hit > with this RVC? > explain SELECT PK1, PK2 > FROM TEST_TABLE > WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) > --Note: by adding an ORDER BY statement, the query returns all 3 rows > SELECT PK1, PK2 > FROM TEST_TABLE > WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) > ORDER BY PK2, PK1; > -- the EXPLAIN shows the secondary index is now being used by this query > explain SELECT PK1, PK2 > FROM TEST_TABLE > WHERE (PK2, PK1) >= (TO_DATE('1970-01-01'), null) > ORDER BY PK2, PK1; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)