[
https://issues.apache.org/jira/browse/DRILL-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Turton closed DRILL-8182.
-------------------------------
Resolution: Fixed
> File scan nodes not differentiated by format config
> ---------------------------------------------------
>
> Key: DRILL-8182
> URL: https://issues.apache.org/jira/browse/DRILL-8182
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Other
> Affects Versions: 1.20.0
> Reporter: James Turton
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.20.2
>
> Attachments: Products_Customers_Orders.xlsx
>
>
> Two file scans that differ only by format config overriden with table
> functions may be genuinely different in terms of the data they return. The
> format config options may affect the behaviour of the format parser (date
> strings, delimiters, etc.) possibly directing format plugin to entirely
> different data within the file. Such scans should not be considered the same
> by the query planner. This illustrated by the following example based on the
> Excel format plugin.
> When a query includes multiple SELECTs against a workbook by using TABLE
> functions to access different sheets, and those sheets contain a column with
> the same name, then values for that column come a single sheet for both
> SELECTs. To reproduce, run the following query against the attachment and
> note that the `Name` values returned from the Products sheet are `Name`
> values from the Customers sheet.
>
> {code:java}
> with
> prod as (
> select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx`
> (type => 'excel', sheetName => 'Products'))
> )
> , cust as (
> select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx`
> (type => 'excel', sheetName => 'Customers'))
> )
> select * from cust join prod on cust.Id = prod.Id; {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)