[ 
https://issues.apache.org/jira/browse/DRILL-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshuman Mishra updated DRILL-8331:
-----------------------------------
    Description: 
Hi All,
I am having a query regarding a query I am firing up in drill and not getting 
the expected result set. Basically what I am doing is merging the data from 
three sheets and eliminating only those records present in the 4th sheet.
The query:
{quote}{_}select * from (SELECT EmpId,Test FROM                         
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName        =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM  
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM  
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from 
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName    =>'Sheet4'))){_}
{quote}
The query when we execute it, returns no data, when it should just eliminate 
the data present in sheet4 from the excel sheet. Strangely enough when we do a 
union All on two sheets this query works perfectly fine and returns the 
expected result. One more finding is, if we use static values in 'not in' 
instead of 'subquery', the query works fine with all the 3 sheets as union all. 
I even gave a null check in the subquery used in 'not in' as it might consider 
null values as well, still no luck.
NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected 
result set.
I have attached a sample file I am using to recreate the issue. 
[^TestingMerge.xlsx]

  was:
Hi All,
I am having a query regarding a query I am firing up in drill and not getting 
the expected result set. Basically what I am doing is merging the data from 
three sheets and eliminating only those records present in the 4th sheet.
The query:
{quote}{_}select * from (SELECT EmpId,Test FROM                         
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName        =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM  
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM  
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from 
TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
 => 'excel', sheetName    =>'Sheet4'))){_}
{quote}
The query when we execute it, returns no data, when it should just eliminate 
the data present in sheet4 from the excel sheet. Strangely enough when we do a 
union All on two sheets this query works perfectly fine and returns the 
expected result. One more finding is, if we use static values in 'not in' 
instead of 'subquery', the query works fine with all the 3 sheets as union all. 
I even gave a null check in the subquery used in 'not in' as it might consider 
null values as well, still no luck. Not sure where I am going wrong with this. 
Would appreciate any inputs.
NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected 
result set.
I have attached a sample file I am using to recreate the issue. 
[^TestingMerge.xlsx]


> UNION ALL for multiple sheets in excel giving unexpected result when used 
> with a subquery in not in
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8331
>                 URL: https://issues.apache.org/jira/browse/DRILL-8331
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Excel
>    Affects Versions: 1.20.2
>            Reporter: Anshuman Mishra
>            Priority: Critical
>         Attachments: TestingMerge.xlsx
>
>
> Hi All,
> I am having a query regarding a query I am firing up in drill and not getting 
> the expected result set. Basically what I am doing is merging the data from 
> three sheets and eliminating only those records present in the 4th sheet.
> The query:
> {quote}{_}select * from (SELECT EmpId,Test FROM                         
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
>  => 'excel', sheetName        =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM  
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
>  => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM  
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
>  => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from 
> TABLE(dfs.{_}{{{}_[C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type
>  => 'excel', sheetName    =>'Sheet4'))){_}
> {quote}
> The query when we execute it, returns no data, when it should just eliminate 
> the data present in sheet4 from the excel sheet. Strangely enough when we do 
> a union All on two sheets this query works perfectly fine and returns the 
> expected result. One more finding is, if we use static values in 'not in' 
> instead of 'subquery', the query works fine with all the 3 sheets as union 
> all. I even gave a null check in the subquery used in 'not in' as it might 
> consider null values as well, still no luck.
> NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected 
> result set.
> I have attached a sample file I am using to recreate the issue. 
> [^TestingMerge.xlsx]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to