Github user jvwing commented on the issue: https://github.com/apache/nifi/pull/929 @jdye64 Thanks, I looked into some of these error cases: * Unsupported .xls - Throws exception, routes to 'original'. The error bulletin seems a bit terse: > ConvertExcelToCSVProcessor[id=43426b41-015a-1000-b06e-3a9be79162d1] Package should contain a content type part [M1.13] * Blank Sheet - Succeeds, one empty flowfile to 'success' and the original to 'original'. No Errors. Seems correct to me. * Empty Workbook - I wasn't able to create an empty workbook manually in Excel. * Unmatched Sheet Names - Covered by testNonExistantSpecifiedSheetName, seems correct to me. * Diverse Content - I tried a number of bizarre things Excel lets you put in spreadsheets -- images, tables, pivot tables, formulas, hidden sheets, etc. Where appropriate, the processor returned content (cells in tables and pivot tables, last computed formula value), but did not run into errors. Images were ignored. * CSV-Breaking Content - The processor does not escape text to enforce a CSV structure. Sheets containing multiline text in a cell, cells with commas, etc. resulted in improperly formed CSV files. I do not object, I understand that's a significant scope increase. I recommend that you do the following: 1. Clearly document that the processor only supports .xlsx and NOT .xls files. I know you've already answered questions about this on the dev list, so somebody's going to try it. 2. For the .xls case, would it be possible to catch the `org.apache.poi.openxml4j.exceptions.InvalidFormatException`, repackage that with a helpful error message suggesting that only well-formed XLSX files are accepted, and route to failure? 3. Document that the processor does not escape invalid CSV content. 4. Logging changes from the code review post above.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---