[ 
https://issues.apache.org/jira/browse/DRILL-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742818#comment-16742818
 ] 

Khurram Faraaz commented on DRILL-6146:
---------------------------------------

The issue can be reproduced on Drill 1.14.0.

When LHS has an empty table ( in this case the directory empty_dir has an empty 
file f1.json )

Whereas, when we flip the LHS and RHS in the failing query, it returns a 
result, with one record, having the value 1
{noformat}
[test@test102-45 ~]# hadoop fs -ls /tmp/empty_dir
Found 1 items
-rwxr-xr-x 3 test test 0 2019-01-15 06:59 /tmp/empty_dir/f1.json

failing query => select id from dfs.tmp.`empty_dir` union select * from 
(values(1));

00-00    Screen : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost 
= {8.1 rows, 20.1 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1216
00-01      Project(id=[$0]) : rowType = RecordType(ANY id): rowcount = 1.0, 
cumulative cost = {8.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 
1215
00-02        StreamAgg(group=[{0}]) : rowType = RecordType(ANY id): rowcount = 
1.0, cumulative cost = {7.0 rows, 19.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, 
id = 1214
00-03          Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY id): 
rowcount = 2.0, cumulative cost = {5.0 rows, 11.0 cpu, 0.0 io, 0.0 network, 
16.0 memory}, id = 1213
00-04            UnionAll(all=[true]) : rowType = RecordType(ANY id): rowcount 
= 2.0, cumulative cost = {3.0 rows, 3.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 1212
00-06              Scan(table=[[dfs, tmp, empty_dir]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/empty_dir, numFiles=1, columns=[`id`], 
files=[maprfs:///tmp/empty_dir/f1.json]]]) : rowType = RecordType(ANY id): 
rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 1210
00-05              Values(tuples=[[{ 1 }]]) : rowType = RecordType(INTEGER 
EXPR$0): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 1211{noformat}

> UNION with empty input on any one side returns incorrect results
> ----------------------------------------------------------------
>
>                 Key: DRILL-6146
>                 URL: https://issues.apache.org/jira/browse/DRILL-6146
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.12.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>            Priority: Major
>
> When any one side of the UNION has an empty file as input, Drill returns 
> incorrect results.
>  
> table t3 does not have any data inserted into its rows. Postgress returns 1 
> as the result for both the queries, whereas Drill does not.
>  
> {noformat}
> postgres=# create table t3(id int, name varchar(25));
> CREATE TABLE 
> postgres=# select * from (values(1)) t union select id from t3;
>        1
>  
> postgres=# select id from t3 union select * from (values(1)) t;
>   1
>  {noformat}
>  
>  
> Results from Drill 1.12.0-mapr, note we return result 1 as result after the 
> union.
> We have a directory named empty_JSON_f , and it has a single empty JSON file 
> (that JSON file has no content in it, it is empty).
>  
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from (values(1)) UNION select id from 
> empty_JSON_f;
> +---------+
> | EXPR$0  |
> +---------+
> | 1       |
> +---------+
> 1 row selected (2.272 seconds){noformat}
> However, in this query we return null and loose the value 1 from the right 
> hand side, after the union, this doesn't seem correct 
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select id from empty_JSON_f UNION select * from 
> (values(1));
> +-------+
> |  id   |
> +-------+
> | null  |
> +-------+
> 1 row selected (0.33 seconds){noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to