[ https://issues.apache.org/jira/browse/HAWQ-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326423#comment-15326423 ]
Ruilong Huo edited comment on HAWQ-800 at 6/12/16 12:43 PM: ------------------------------------------------------------ Per investigation and discussion with Hubert and Ming, this is a bug introduced in prepare statement (prepare either in udf or in sql statement) in hawq 2.0. The root cause is that there are two phases in prepared statement: phase 1) plan + metadata + datalocality is prepared and dispatched only once; phase 2) every time the plan is executed, it use the same datalocality. For example, table t (id int) have 1 tuple at beginning, if we do below for loop with "insert into t select * from t" as prepared statement, it gives below result: {noformat} phase content in table t hawq 2.0 hawq 1.x --------------------------------------------------------------------------------------------------------------------------------- beginning i = 0 {1} {1} for i = 1 ~ 3: do "insert into t select * from t" i = 1 {1, 1} {1, 1} i = 2 {1, 1, 1} {1, 1, 1, 1} i = 3 {1, 1, 1, 1} {1, 1, 1, 1, 1, 1, 1, 1} {noformat} To be specific, every time it execute the "insert into select", it uses the initial datalocality (i.e., initial tuple {1} in table t) hawq 2.0; while in hawq 1.x, every segment read all its latest data and do the insert. To keep the prepared statement feature, we propose below fix: 1. Separate the dispatching into two separate part: 1) plan+metadata, which is the same as 1.x; 2) datalocality + resource_negotiator 2. Every time we dispatch prepared plan + metadata in 1) for execution, we re-calculate the datalocality and re-do the resource negotiation, and then dispatch it along with 1). 3. If the resource (i.e., vseg number) change in above step 2, we re-prepare and then cache the new plan PS: note that for prepared statement in UDF, we use a fixed number of vsegs; while in prepared statement in SQL, data size change may incur change of resource (i.e., vseg number) and thus the plan. was (Author: huor): Per investigation and discussion with Hubert and Ming, this is a bug introduced in prepare statement (prepare either in udf or in sql statement) in hawq 2.0. The root cause is that there are two phases in prepared statement: phase 1) plan + metadata + datalocality is prepared and dispatched only once; phase 2) every time the plan is executed, it use the same datalocality. For example, table t (id int) have 1 tuple at beginning, if we do below for loop with "insert into t select * from t" as prepared statement, it gives below result: {noformat} phase content in table t hawq 2.0 hawq 1.x -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- beginning i = 0 {1} {1} for i = 1 ~ 3: do "insert into t select * from t" i = 1 {1, 1} {1, 1} i = 2 {1, 1, 1} {1, 1, 1, 1} i = 3 {1, 1, 1, 1} {1, 1, 1, 1, 1, 1, 1, 1} {noformat} To be specific, every time it execute the "insert into select", it uses the initial datalocality (i.e., initial tuple {1} in table t) hawq 2.0; while in hawq 1.x, every segment read all its latest data and do the insert. To keep the prepared statement feature, we propose below fix: 1. Separate the dispatching into two separate part: 1) plan+metadata, which is the same as 1.x; 2) datalocality + resource_negotiator 2. Every time we dispatch prepared plan + metadata in 1) for execution, we re-calculate the datalocality and re-do the resource negotiation, and then dispatch it along with 1). 3. If the resource (i.e., vseg number) change in above step 2, we re-prepare and then cache the new plan PS: note that for prepared statement in UDF, we use a fixed number of vsegs; while in prepared statement in SQL, data size change may incur change of resource (i.e., vseg number) and thus the plan. > Less tuple is inserted due to data locality information is not refreshed and > dispatched in prepared statement > ------------------------------------------------------------------------------------------------------------- > > Key: HAWQ-800 > URL: https://issues.apache.org/jira/browse/HAWQ-800 > Project: Apache HAWQ > Issue Type: Bug > Components: Core, Dispatcher, Query Execution, Resource Manager > Reporter: Ruilong Huo > Assignee: Ruilong Huo > Attachments: proba.out, proba.sql, proba_execute.out, > proba_execute.sql > > > In either explicit (SQL) or implicit (UDF) prepared statement, there is less > tuple inserted if we run a prepared "insert into t select * from t" plan > multiple times in a transaction. > Below is a simple case to reproduce this issue. For a more complicated > example, you may refer to attached proba_execute and proba. > 1. There should be 8 tuples, however there is only 4 in hawq 2.0 > {noformat} > drop table if exists t; > DROP TABLE > create table t (id int); > CREATE TABLE > insert into t values (1); > INSERT 0 1 > CREATE OR REPLACE FUNCTION f_load() > RETURNS TEXT > LANGUAGE plpgsql > AS > $body$ > DECLARE > l_rec RECORD; > l_itm RECORD; > BEGIN > FOR l_rec IN ( SELECT generate_series(1, 3) AS id ) > LOOP > INSERT INTO t SELECT * FROM t; > END LOOP; > RETURN 'done'; > END; > $body$ > ; > CREATE FUNCTION > SELECT f_load(); > f_load > -------- > done > (1 row) > SELECT * FROM t; > id > ---- > 1 > 1 > 1 > 1 > (4 rows) > {noformat} > 2. There are 8 tuples as expected in hawq 1.x > {noformat} > drop table if exists t; > DROP TABLE > create table t (id int); > psql:temp.sql:3: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using > column named 'id' as the Greenplum Database data distribution key for this > table. > HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make > sure column(s) chosen are the optimal data distribution key to minimize skew. > CREATE TABLE > insert into t values (1); > INSERT 0 1 > CREATE OR REPLACE FUNCTION f_load() > RETURNS TEXT > LANGUAGE plpgsql > AS > $body$ > DECLARE > l_rec RECORD; > l_itm RECORD; > BEGIN > FOR l_rec IN ( SELECT generate_series(1, 3) AS id ) > LOOP > INSERT INTO t SELECT * FROM t; > END LOOP; > RETURN 'done'; > END; > $body$ > ; > CREATE FUNCTION > SELECT f_load(); > f_load > -------- > done > (1 row) > SELECT * FROM t; > id > ---- > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > (8 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)