jnturton opened a new pull request, #2583:
URL: https://github.com/apache/drill/pull/2583

   # [DRILL-8182](https://issues.apache.org/jira/browse/DRILL-8182): File scan 
nodes not differentiated by format config
   
   ## Description
   
   Two file scans that differ only by format config overriden with table 
functions may be genuinely different in terms of the data they return. The 
format config options may affect the behaviour of the format parser (date 
strings, delimiters, etc.) possibly directing format plugin to entirely 
different data within the file. Such scans should not be considered the same by 
the query planner. This illustrated by the following example based on the Excel 
format plugin.
   
   When a query includes multiple SELECTs against a workbook by using TABLE 
functions to access different sheets, and those sheets contain a column with 
the same name, then values for that column come a single sheet for both 
SELECTs.  To reproduce, run the following query against the attachment and note 
that the `Name` values returned from the Products sheet are `Name` values from 
the Customers sheet.
   ```
   with
   prod as (
       select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` 
(type => 'excel', sheetName => 'Products'))
   )
   , cust as (
       select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` 
(type => 'excel', sheetName => 'Customers'))
   )
   select * from cust join prod on cust.Id = prod.Id; 
   ```
   
   ## Documentation
   N/A
   
   ## Testing
   New unit test: TestExcelFormat#testTableFuncsThatDifferOnlyByFormatConfig
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to