Anton Gozhiy created DRILL-6856: ----------------------------------- Summary: Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions Key: DRILL-6856 URL: https://issues.apache.org/jira/browse/DRILL-6856 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0 Reporter: Anton Gozhiy Attachments: 0_0_0.parquet
*Data:* A parquet file with a boolean column that contains null values. An example is attached. *Query:* {code:sql} select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col is null {code} *Result:* {noformat} null null {noformat} *Plan:* {noformat} 00-00 Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1980 00-01 Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1979 00-02 SelectionVectorRemover : rowType = RecordType(ANY bool_col): rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1978 00-03 Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1977 00-04 Scan(table=[[dfs, tmp, Test_data]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976 {noformat} *Notes:* - "true" values were not included in the result though they should have. - Result is correct if use "bool_col = true" instead of "is true" - In the plan you can see that "is true" condition is absent in the Filter operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)