Jihoon Son created TAJO-1315:
--------------------------------
Summary: Invalid results are returned when a source table consists
of multiple csv files
Key: TAJO-1315
URL: https://issues.apache.org/jira/browse/TAJO-1315
Project: Tajo
Issue Type: Bug
Components: storage
Reporter: Jihoon Son
Priority: Critical
Fix For: 0.10
See the title.
Here are some examples related to this bug.
{noformat}
default> \dfs -ls /customer.tbl
Found 19 items
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000001
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000002
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000003
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000004
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000005
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000006
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000007
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000008
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000009
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000010
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000011
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000012
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000013
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000014
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000015
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25
/customer.tbl/000016
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:26
/customer.tbl/000017
-rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:26
/customer.tbl/000018
-rw-r--r-- 3 hadoop supergroup 47571167 2015-01-26 20:26
/customer.tbl/000019
default> create external table test (C_CUSTKEY bigint, C_NAME text, C_ADDRESS
text, C_NATIONKEY bigint, C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text,
C_COMMENT text) using csv with ('csvfile.delimiter'='|') location
'hdfs://192.168.0.1:7020/customer.tbl';
OK
default> \d test
table name: tpch_swift.test
table path: hdfs://192.168.0.1:7020/customer.tbl
store type: CSV
number of rows: unknown
volume: 2.5 GB
Options:
'text.delimiter'='|'
schema:
c_custkey INT8
c_name TEXT
c_address TEXT
c_nationkey INT8
c_phone TEXT
c_acctbal FLOAT8
c_mktsegment TEXT
c_comment TEXT
default> select count(*) from test;
?count
-------------------------------
15000017
(1 rows, 3.2 sec, 9 B selected)
{noformat}
As you can see, the expected result is 15000000, but the real result was
15000017.
So, I investigated error tuples as follows.
{noformat}
default> select c_custkey, count(*) as cnt from customer2 group by c_custkey
having cnt > 1;
c_custkey, cnt
-------------------------------
, 14
114575, 2
14711665, 2
34, 2
(4 rows, 16.681 sec, 29 B selected)
default> select * from customer2 where c_custkey is null or c_custkey = 114575
or c_custkey = 14711665 or c_custkey = 34;
c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal,
c_mktsegment, c_comment
-------------------------------
34, Customer#000000034, Q6G9wZ6dnczmtOx509xgE,M2KV, 15, 25-344-968-5422,
8589.7, HOUSEHOLD, nder against the even, pending accounts. even
114575, Customer#000114575, xqLzTzY0,QvqwlSPI8OLxjRQ4s2W7pkSWwK, 16,
26-303-921-2836, 6663.68, AUTOMOBILE, le fluffily final deposits. furiously
regu
, 21, 31-264-911-5053, , HOUSEHOLD, 0.0, ,
, IexCQQNp7tsMK63QKrGw37H3JJXGPaXBk, 18, , 4313.01, 0.0, the never
pending accounts. slyly fluffy pinto beans run fluffily. furiously ,
, , , , , , ,
, 152.95, MACHINERY, , , , ,
, t the ironic, close accounts are careful, , , , , ,
, 20, 30-481-475-8163, , AUTOMOBILE, 0.0, ,
, , , , , , ,
, MACHINERY, ts use slyly even dependencie, , , , ,
, , , , , , ,
, 24, 34-639-456-9692, , FURNITURE, 0.0, ,
, , , , , , ,
114575, , , , , , ,
34, Customer#011457534, wFUkCU67OxuxvfQeSdvSMDtMB7DWt7jiw, 2,
12-145-168-8442, 145.78, MACHINERY, ic accounts. ironic, final ideas sleep qu
, XPP8pRDTDs4MFMP7SSlv, 17, , 5437.09, 0.0, egular requests cajole slyly
after the ,
, blithely along the regular, daring deposits. ironic acco, , , , , ,
, 12, 22-656-233-3821, , HOUSEHOLD, 0.0, ,
14711665, Customer#0, , , , , ,
14711665, QKTarsTkX7, 19, , 7017.62, 0.0, ly after the carefully ironic
theodolites. pending requests are slyly across the deposits. even accounts
boost. fina,
(20 rows, 8.964 sec, 1.2 KiB selected)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)