Anshuman Mishra created DRILL-8331:
--------------------------------------

             Summary: UNION ALL for multiple sheets in excel giving unexpected 
result when used with a subquery in not in
                 Key: DRILL-8331
                 URL: https://issues.apache.org/jira/browse/DRILL-8331
             Project: Apache Drill
          Issue Type: Bug
          Components: Client - Java
    Affects Versions: 1.20.2
            Reporter: Anshuman Mishra
         Attachments: TestingMerge.xlsx

Hi All,
I am having a query regarding a query I am firing up in drill and not getting 
the expected result set. Basically what I am doing is merging the data from 
three sheets and eliminating only those records present in the 4th sheet.
The query:
{quote}{_}select * from (SELECT EmpId,Test FROM                         
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName        =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM  
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM  
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from 
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName    =>'Sheet4'))){_}{quote}
The query when we execute it, return no data, when it should just eliminate the 
data present in sheet4 from the excel sheet. Strangely enough when we do a 
union All on two sheets this query works perfectly fine and returns the 
expected result. One more finding is, if we use static values in 'not in' 
instead of 'subquery', the query works fine with all the 3 sheets as union all. 
I even gave a null check in the subquery used in 'not in' as it might consider 
null values as well, still no luck. Not sure where I am going wrong with this. 
Would appreciate any inputs.
NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected 
result set.
I have attached a sample file I am using to recreate the issue. 
[^TestingMerge.xlsx]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to