Stamatis Zampetakis created HIVE-26168:
------------------------------------------
Summary: EXPLAIN DDL command output is not deterministic
Key: HIVE-26168
URL: https://issues.apache.org/jira/browse/HIVE-26168
Project: Hive
Issue Type: Bug
Components: HiveServer2
Reporter: Stamatis Zampetakis
The EXPLAIN DDL command (HIVE-24596) can be used to recreate the schema for a
given query in order to debug planner issues. This is achieved by fetching
information from the metastore and outputting series of DDL commands.
The output commands though may appear in different order among runs since there
is no mechanism to enforce an explicit order.
Consider for instance the following scenario.
{code:sql}
CREATE TABLE customer
(
`c_custkey` bigint,
`c_name` string,
`c_address` string
);
INSERT INTO customer VALUES (1, 'Bob', '12 avenue Mansart'), (2, 'Alice', '24
avenue Mansart');
EXPLAIN DDL SELECT c_custkey FROM customer WHERE c_name = 'Bob';
{code}
+Result 1+
{noformat}
ALTER TABLE default.customer UPDATE STATISTICS
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg==
{noformat}
+Result 2+
{noformat}
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_custkey
SET('lowValue'='1','highValue'='2','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_custkey BUT THEY ARE
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwfO+SIOOofED
ALTER TABLE default.customer UPDATE STATISTICS
SET('numRows'='2','rawDataSize'='48' );
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_address
SET('avgColLen'='17.0','maxColLen'='17','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_address BUT THEY ARE
NOT SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAICwbec/QPAjtBF
ALTER TABLE default.customer UPDATE STATISTICS FOR COLUMN c_name
SET('avgColLen'='4.0','maxColLen'='5','numNulls'='0','numDVs'='2' );
-- BIT VECTORS PRESENT FOR default.customer FOR COLUMN c_name BUT THEY ARE NOT
SUPPORTED YET. THE BASE64 VALUE FOR THE BITVECTOR IS SExMoAIChJLg1AGD1aCNBg==
{noformat}
The two results are equivalent but the statements appear in a different order.
This is not a big issue cause the results remain correct but it may lead to
test flakiness so it might be worth addressing.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)