[ https://issues.apache.org/jira/browse/HAWQ-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruilong Huo updated HAWQ-280: ----------------------------- Affects Version/s: 2.0.0-beta-incubating > Error accessing external table or copying from file with bad rows > ----------------------------------------------------------------- > > Key: HAWQ-280 > URL: https://issues.apache.org/jira/browse/HAWQ-280 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables > Affects Versions: 2.0.0-beta-incubating > Reporter: Ruilong Huo > Assignee: Lei Chang > > It errors out without return result when accessing external table or copying > from file with bad rows. > 1. Error accessing external table with bad rows > {noformat} > Step 1: download attached test.csv with 2000 row which are all bad formated > Step 2: start gpfdist service > gpfdist -d /home/gpadmin/data/ -p 8081 -l /home/gpadmin/log/load.log & > ------------------------------------------------------------------------------------------------ > [1] 34635 > Serving HTTP on port 8081, directory /home/gpadmin/data > Step 3: create external table > CREATE EXTERNAL TABLE test_ext (id INT, a TEXT, b TEXT, c TEXT, z TEXT) > LOCATION ('gpfdist://localhost:8081/test.csv') > FORMAT 'CSV' > LOG ERRORS INTO test_ext_err SEGMENT REJECT LIMIT 3000 ROWS; > ----------------------------------------------------------------------------------------------------- > NOTICE: Error table "test_ext_err" does not exist. Auto generating an error > table with the same name > CREATE EXTERNAL TABLE > Step 4: access external table > SELECT COUNT(*) FROM test_ext; > ------------------------------------------------- > ERROR: All 1000 first rows in this segment were rejected. Aborting operation > regardless of REJECT LIMIT value. Last error was: missing data for column "z" > (seg0 localhost:40000 pid=35647) > DETAIL: External table test_ext, line 1000 of > gpfdist://localhost:8081/test.csv: "29,aaa,bbb,zzz" > {noformat} > 2. Error copying from file with bad rows > {noformat} > Step 1: download attached test.csv with 2000 row which are all bad formated > Step 2: create table > CREATE TABLE test_copy (id INT, a TEXT, b TEXT, c TEXT, z TEXT); > ------------------------------------------------------------------------------------------------ > CREATE TABLE > Step 3: copy data in file to table in database > COPY test_copy FROM '/Users/intern/Downloads/test.csv' LOG ERRORS INTO > test_copy_err SEGMENT REJECT LIMIT 3000 ROWS; > -------------------------------------------------------------------------------------------------------- > NOTICE: Error table "test_copy_err" does not exist. Auto generating an error > table with the same name > WARNING: The error table was created in the same transaction as this > operation. It will get dropped if transaction rolls back even if bad rows are > present > HINT: To avoid this create the error table ahead of time using: CREATE TABLE > <name> (cmdtime timestamp with time zone, relname text, filename text, > linenum integer, bytenum integer, errmsg text, rawdata text, rawbytes bytea) > ERROR: All 1000 first rows in this segment were rejected. Aborting operation > regardless of REJECT LIMIT value. Last error was: missing data for column "a" > CONTEXT: COPY test_copy, line 1000: "29,aaa,bbb,zzz" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)