Github user jvwing commented on the issue:

    https://github.com/apache/nifi/pull/929
  
    @jdye64 Thanks, I looked into some of these error cases:
    
    * Unsupported .xls - Throws exception, routes to 'original'.  The error 
bulletin seems a bit terse:
    > ConvertExcelToCSVProcessor[id=43426b41-015a-1000-b06e-3a9be79162d1] 
Package should contain a content type part [M1.13]
    * Blank Sheet - Succeeds, one empty flowfile to 'success' and the original 
to 'original'.  No Errors.  Seems correct to me.
    * Empty Workbook - I wasn't able to create an empty workbook manually in 
Excel.  
    * Unmatched Sheet Names - Covered by testNonExistantSpecifiedSheetName, 
seems correct to me.
    * Diverse Content - I tried a number of bizarre things Excel lets you put 
in spreadsheets -- images, tables, pivot tables, formulas, hidden sheets, etc.  
Where appropriate, the processor returned content (cells in tables and pivot 
tables, last computed formula value), but did not run into errors.  Images were 
ignored.
    * CSV-Breaking Content - The processor does not escape text to enforce a 
CSV structure.  Sheets containing multiline text in a cell, cells with commas, 
etc. resulted in improperly formed CSV files.  I do not object, I understand 
that's a significant scope increase.
    
    I recommend that you do the following:
    
    1. Clearly document that the processor only supports .xlsx and NOT .xls 
files.  I know you've already answered questions about this on the dev list, so 
somebody's going to try it.
    2. For the .xls case, would it be possible to catch the 
`org.apache.poi.openxml4j.exceptions.InvalidFormatException`, repackage that 
with a helpful error message suggesting that only well-formed XLSX files are 
accepted, and route to failure?
    3. Document that the processor does not escape invalid CSV content.
    4. Logging changes from the code review post above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to