[
https://issues.apache.org/jira/browse/DRILL-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anshuman Mishra updated DRILL-8331:
-----------------------------------
Component/s: Storage - Excel
(was: Client - Java)
> UNION ALL for multiple sheets in excel giving unexpected result when used
> with a subquery in not in
> ---------------------------------------------------------------------------------------------------
>
> Key: DRILL-8331
> URL: https://issues.apache.org/jira/browse/DRILL-8331
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Excel
> Affects Versions: 1.20.2
> Reporter: Anshuman Mishra
> Priority: Critical
> Attachments: TestingMerge.xlsx
>
>
> Hi All,
> I am having a query regarding a query I am firing up in drill and not getting
> the expected result set. Basically what I am doing is merging the data from
> three sheets and eliminating only those records present in the 4th sheet.
> The query:
> {quote}{_}select * from (SELECT EmpId,Test FROM
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
> => 'excel', sheetName =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
> => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
> => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
> => 'excel', sheetName =>'Sheet4'))){_}{quote}
> The query when we execute it, return no data, when it should just eliminate
> the data present in sheet4 from the excel sheet. Strangely enough when we do
> a union All on two sheets this query works perfectly fine and returns the
> expected result. One more finding is, if we use static values in 'not in'
> instead of 'subquery', the query works fine with all the 3 sheets as union
> all. I even gave a null check in the subquery used in 'not in' as it might
> consider null values as well, still no luck. Not sure where I am going wrong
> with this. Would appreciate any inputs.
> NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected
> result set.
> I have attached a sample file I am using to recreate the issue.
> [^TestingMerge.xlsx]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)