[ https://issues.apache.org/jira/browse/DRILL-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anshuman Mishra updated DRILL-8331: ----------------------------------- Component/s: Storage - Excel (was: Client - Java) > UNION ALL for multiple sheets in excel giving unexpected result when used > with a subquery in not in > --------------------------------------------------------------------------------------------------- > > Key: DRILL-8331 > URL: https://issues.apache.org/jira/browse/DRILL-8331 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Excel > Affects Versions: 1.20.2 > Reporter: Anshuman Mishra > Priority: Critical > Attachments: TestingMerge.xlsx > > > Hi All, > I am having a query regarding a query I am firing up in drill and not getting > the expected result set. Basically what I am doing is merging the data from > three sheets and eliminating only those records present in the 4th sheet. > The query: > {quote}{_}select * from (SELECT EmpId,Test FROM > TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type > => 'excel', sheetName =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM > TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type > => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM > TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type > => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from > TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type > => 'excel', sheetName =>'Sheet4'))){_}{quote} > The query when we execute it, return no data, when it should just eliminate > the data present in sheet4 from the excel sheet. Strangely enough when we do > a union All on two sheets this query works perfectly fine and returns the > expected result. One more finding is, if we use static values in 'not in' > instead of 'subquery', the query works fine with all the 3 sheets as union > all. I even gave a null check in the subquery used in 'not in' as it might > consider null values as well, still no luck. Not sure where I am going wrong > with this. Would appreciate any inputs. > NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected > result set. > I have attached a sample file I am using to recreate the issue. > [^TestingMerge.xlsx] -- This message was sent by Atlassian Jira (v8.20.10#820010)