[
https://issues.apache.org/jira/browse/PHOENIX-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552558#comment-15552558
]
Sumit Nigam commented on PHOENIX-3354:
--------------------------------------
Here are the 2 DDLs (for data and index tables):
CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT NOT
NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM VARCHAR(32),
BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32), PREV_TIMESTAMP BIGINT,
SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES VARCHAR, BULK_PUBLISH_UUID
VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC VARCHAR, PREV_EXDOC VARCHAR CONSTRAINT
PK PRIMARY KEY(CURRENT_TIMESTAMP, ID)) COMPRESSION = 'SNAPPY', BLOCKCACHE =
false, SALT_BUCKETS = 36
CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID)
INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)
Here is the upsert query for this table:
UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID,
CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP,
OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC, BINARY_PREV_EXDOC)
VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Here is explain plan of a SELECT with merge sort:
explain select /* +USE_SORT_MERGE_JOIN*/ ID, CURR_EXDOC, BINARY_CURR_EXDOC,
CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM,
PREV_TIMESTAMP from "ldmns:exDocStoreb" as a inner join (select
max(CURRENT_TIMESTAMP) as mct, ID as tid from "ldmns:exDocStoreb" where ID like
' 006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' group by
ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where id like '
006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' ;
+----------------------------- -------------+
| PLAN |
+----------------------------- -------------+
| SORT-MERGE-JOIN (INNER) TABLES |
| CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] |
| SERVER FILTER BY ID LIKE ' 006389a6b10667f39bdbbdafdc4611
e03cc04418cbc2619ddc01f54d88d7 c3bf%' |
| SERVER SORTED BY [A.ID, A.CURRENT_TIMESTAMP] |
| CLIENT MERGE SORT |
| AND (SKIP MERGE) |
| CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb [0,'
006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf'] -
[0,'006389 |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
| CLIENT MERGE SORT |
| CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] |
+----------------------------- -------------+
11 rows selected (0.025 seconds)
Here is explain plan with default join:
explain SELECT ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP,
CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP
from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as mct,
ID as tid from "ldmns:exDocStoreb" where ID like '
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group by
ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like '
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' ;
+----------------------------- -------------+
| PLAN |
+----------------------------- -------------+
| CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] |
| SERVER FILTER BY ID LIKE ' 42ecf4abd4bd7e7606025dc8eee3de
6a3cc04418cbc2619ddc01f54d88d7 c3bf%' |
| PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
| CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:indx_exdocb [0,'
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf'] - [0,'
42ecf4abd4bd7e7606025dc8eee3de 6a3cc0 |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
| CLIENT MERGE SORT |
| DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT,
TMP.TID)) |
+----------------------------- -------------+
8 rows selected (0.033 seconds)
I am on phoenix 4.6/ hbase 1.1
Just changing the join algorithm should be enough. I would assume that changing
the hash join to sort-merge join would not alter the query results which it
does.
> SORT_MERGE join on single tenant, salted table misses records randomly.
> -----------------------------------------------------------------------
>
> Key: PHOENIX-3354
> URL: https://issues.apache.org/jira/browse/PHOENIX-3354
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.0, 4.6.0
> Environment: Hbase 1.1.2
> Reporter: Sumit Nigam
> Priority: Critical
>
> Single-tenant, salted table on Hbase 1.1.2 has ~3 million records. It has a
> secondary index as well. The SORT_MERGE join hint misses data randomly when
> used in queries.
> I will add the DDLs and explain plan for all the queries that access this
> table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)