Jihoon Son created TAJO-1315:
--------------------------------

             Summary: Invalid results are returned when a source table consists 
of multiple csv files
                 Key: TAJO-1315
                 URL: https://issues.apache.org/jira/browse/TAJO-1315
             Project: Tajo
          Issue Type: Bug
          Components: storage
            Reporter: Jihoon Son
            Priority: Critical
             Fix For: 0.10


See the title.
Here are some examples related to this bug.
{noformat}
default> \dfs -ls /customer.tbl
Found 19 items
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000001
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000002
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000003
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000004
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000005
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000006
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000007
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000008
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000009
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000010
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000011
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000012
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000013
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000014
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000015
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 
/customer.tbl/000016
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:26 
/customer.tbl/000017
-rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:26 
/customer.tbl/000018
-rw-r--r--   3 hadoop supergroup   47571167 2015-01-26 20:26 
/customer.tbl/000019

default> create external table test (C_CUSTKEY bigint, C_NAME text, C_ADDRESS 
text, C_NATIONKEY bigint, C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text, 
C_COMMENT text) using csv with ('csvfile.delimiter'='|') location 
'hdfs://192.168.0.1:7020/customer.tbl';
OK
default> \d test

table name: tpch_swift.test
table path: hdfs://192.168.0.1:7020/customer.tbl
store type: CSV
number of rows: unknown
volume: 2.5 GB
Options: 
        'text.delimiter'='|'

schema: 
c_custkey       INT8
c_name  TEXT
c_address       TEXT
c_nationkey     INT8
c_phone TEXT
c_acctbal       FLOAT8
c_mktsegment    TEXT
c_comment       TEXT

default> select count(*) from test;
?count
-------------------------------
15000017
(1 rows, 3.2 sec, 9 B selected)
{noformat}
As you can see, the expected result is 15000000, but the real result was 
15000017.

So, I investigated error tuples as follows.
{noformat}
default> select c_custkey, count(*) as cnt from customer2 group by c_custkey 
having cnt > 1;
c_custkey,  cnt
-------------------------------
,  14
114575,  2
14711665,  2
34,  2
(4 rows, 16.681 sec, 29 B selected)

default> select * from customer2 where c_custkey is null or c_custkey = 114575 
or c_custkey = 14711665 or c_custkey = 34;
c_custkey,  c_name,  c_address,  c_nationkey,  c_phone,  c_acctbal,  
c_mktsegment,  c_comment
-------------------------------
34,  Customer#000000034,  Q6G9wZ6dnczmtOx509xgE,M2KV,  15,  25-344-968-5422,  
8589.7,  HOUSEHOLD,  nder against the even, pending accounts. even
114575,  Customer#000114575,  xqLzTzY0,QvqwlSPI8OLxjRQ4s2W7pkSWwK,  16,  
26-303-921-2836,  6663.68,  AUTOMOBILE,  le fluffily final deposits. furiously 
regu
,  21,  31-264-911-5053,  ,  HOUSEHOLD,  0.0,  ,  
,  IexCQQNp7tsMK63QKrGw37H3JJXGPaXBk,  18,  ,  4313.01,  0.0,   the never 
pending accounts. slyly fluffy pinto beans run fluffily. furiously ,  
,  ,  ,  ,  ,  ,  ,  
,  152.95,  MACHINERY,  ,  ,  ,  ,  
,  t the ironic, close accounts are careful,  ,  ,  ,  ,  ,  
,  20,  30-481-475-8163,  ,  AUTOMOBILE,  0.0,  ,  
,  ,  ,  ,  ,  ,  ,  
,  MACHINERY,  ts use slyly even dependencie,  ,  ,  ,  ,  
,  ,  ,  ,  ,  ,  ,  
,  24,  34-639-456-9692,  ,  FURNITURE,  0.0,  ,  
,  ,  ,  ,  ,  ,  ,  
114575,  ,  ,  ,  ,  ,  ,  
34,  Customer#011457534,  wFUkCU67OxuxvfQeSdvSMDtMB7DWt7jiw,  2,  
12-145-168-8442,  145.78,  MACHINERY,  ic accounts. ironic, final ideas sleep qu
,  XPP8pRDTDs4MFMP7SSlv,  17,  ,  5437.09,  0.0,  egular requests cajole slyly 
after the ,  
,  blithely along the regular, daring deposits. ironic acco,  ,  ,  ,  ,  ,  
,  12,  22-656-233-3821,  ,  HOUSEHOLD,  0.0,  ,  
14711665,  Customer#0,  ,  ,  ,  ,  ,  
14711665,  QKTarsTkX7,  19,  ,  7017.62,  0.0,  ly after the carefully ironic 
theodolites. pending requests are slyly across the deposits. even accounts 
boost. fina,  
(20 rows, 8.964 sec, 1.2 KiB selected)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to