[ https://issues.apache.org/jira/browse/HAWQ-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruilong Huo updated HAWQ-800: ----------------------------- Attachment: proba.out proba.sql proba_execute.out proba_execute.sql > Less tuple is inserted due to data locality information is not refreshed and > dispatched in prepared statement > ------------------------------------------------------------------------------------------------------------- > > Key: HAWQ-800 > URL: https://issues.apache.org/jira/browse/HAWQ-800 > Project: Apache HAWQ > Issue Type: Bug > Components: Core, Dispatcher, Query Execution, Resource Manager > Reporter: Ruilong Huo > Assignee: George Caragea > Attachments: proba.out, proba.sql, proba_execute.out, > proba_execute.sql > > > In either explicit (SQL) or implicit (UDF) prepared statement, there is less > tuple inserted if we run a prepared "insert into t select * from t" plan > multiple times in a transaction. > Below is a simple case to reproduce this issue. For a more complicated > example, you may refer to attached proba_execute and proba. > 1. There should be 8 tuples, however there is only 4 in hawq 2.0 > {noformat} > drop table if exists t; > DROP TABLE > create table t (id int); > CREATE TABLE > insert into t values (1); > INSERT 0 1 > CREATE OR REPLACE FUNCTION f_load() > RETURNS TEXT > LANGUAGE plpgsql > AS > $body$ > DECLARE > l_rec RECORD; > l_itm RECORD; > BEGIN > FOR l_rec IN ( SELECT generate_series(1, 3) AS id ) > LOOP > INSERT INTO t SELECT * FROM t; > END LOOP; > RETURN 'done'; > END; > $body$ > ; > CREATE FUNCTION > SELECT f_load(); > f_load > -------- > done > (1 row) > SELECT * FROM t; > id > ---- > 1 > 1 > 1 > 1 > (4 rows) > {noformat} > 2. There are 8 tuples as expected in hawq 1.x > {noformat} > drop table if exists t; > DROP TABLE > create table t (id int); > psql:temp.sql:3: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using > column named 'id' as the Greenplum Database data distribution key for this > table. > HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make > sure column(s) chosen are the optimal data distribution key to minimize skew. > CREATE TABLE > insert into t values (1); > INSERT 0 1 > CREATE OR REPLACE FUNCTION f_load() > RETURNS TEXT > LANGUAGE plpgsql > AS > $body$ > DECLARE > l_rec RECORD; > l_itm RECORD; > BEGIN > FOR l_rec IN ( SELECT generate_series(1, 3) AS id ) > LOOP > INSERT INTO t SELECT * FROM t; > END LOOP; > RETURN 'done'; > END; > $body$ > ; > CREATE FUNCTION > SELECT f_load(); > f_load > -------- > done > (1 row) > SELECT * FROM t; > id > ---- > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > (8 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)