[ 
https://issues.apache.org/jira/browse/NIFI-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann updated NIFI-11167:
------------------------------------
    Description: 
A new Excel Record Reader should be implemented to support reading XSLX 
spreadsheet rows as NiFi Records. This Reader will enable integration with 
various record-oriented components, obviating the need for the narrowly focused 
ConvertExcelToCSVProcessor. The initial version of the Excel Reader should not 
support the legacy binary XLS format.

The ExcelReader should use a library that supports reading from a stream of 
rows to avoid consuming large amounts of heap memory during processing.

The ExcelReader should support configurable properties to read selected sheets. 
With Excel supporting typed field values, some amount of field type mapping 
will be required. Additional input filtering properties should not be 
implemented as existing Processors like QueryRecord support a wide variety of 
filtering and projection use cases.

  was:
A new Excel Record Reader should be implemented to support reading XSLX 
spreadsheet rows as NiFi Records. This Reader will enable integration with 
various record-oriented components, obviating the need for the narrowly focused 
ConvertExcelToCSVProcessor. The initial version of the Excel Reader should not 
support the legacy binary XLS format.

The ExcelReader should use the Apache POI library and build on the [XSSF Event 
API|https://poi.apache.org/components/spreadsheet/how-to.html#xssf_sax_api] to 
avoid consuming large amounts of heap memory during processing.

The ExcelReader should support configurable properties to read selected sheets. 
With Excel supporting typed field values, some amount of field type mapping 
will be required. Additional input filtering properties should not be 
implemented as existing Processors like QueryRecord support a wide variety of 
filtering and projection use cases.


> Add Excel Record Reader
> -----------------------
>
>                 Key: NIFI-11167
>                 URL: https://issues.apache.org/jira/browse/NIFI-11167
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: David Handermann
>            Priority: Minor
>
> A new Excel Record Reader should be implemented to support reading XSLX 
> spreadsheet rows as NiFi Records. This Reader will enable integration with 
> various record-oriented components, obviating the need for the narrowly 
> focused ConvertExcelToCSVProcessor. The initial version of the Excel Reader 
> should not support the legacy binary XLS format.
> The ExcelReader should use a library that supports reading from a stream of 
> rows to avoid consuming large amounts of heap memory during processing.
> The ExcelReader should support configurable properties to read selected 
> sheets. With Excel supporting typed field values, some amount of field type 
> mapping will be required. Additional input filtering properties should not be 
> implemented as existing Processors like QueryRecord support a wide variety of 
> filtering and projection use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to