[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207064#comment-16207064 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/2194 @patricker All LGTM, +1. Merged to master, thank you! > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207061#comment-16207061 ] ASF subversion and git services commented on NIFI-4465: --- Commit fd00df3d2f593b6da6c7498fa66ec6917e1639e0 in nifi's branch refs/heads/master from patricker [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=fd00df3 ] NIFI-4465 ConvertExcelToCSV Data Formatting and Delimiters This closes #2194. Signed-off-by: Koji Kawamura > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207062#comment-16207062 ] ASF GitHub Bot commented on NIFI-4465: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/2194 > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206910#comment-16206910 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2194 @ijokarumawak Updated and squashed. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206908#comment-16206908 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r145012279 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/pom.xml --- @@ -77,6 +76,19 @@ org.apache.nifi +nifi-record-serialization-service-api + + +org.apache.nifi +nifi-standard-record-utils + + +org.apache.commons +commons-csv --- End diff -- Your correct. I've removed it. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206893#comment-16206893 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r145010425 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/pom.xml --- @@ -77,6 +76,19 @@ org.apache.nifi +nifi-record-serialization-service-api + + +org.apache.nifi +nifi-standard-record-utils + + +org.apache.commons +commons-csv --- End diff -- I think by having a dependency to nifi-standard-record-utils, we don't have to add commons-csv explicitly here. I'm fine with keeping it, but I prefer having less dependencies. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206890#comment-16206890 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r145009765 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/pom.xml --- @@ -77,6 +76,19 @@ org.apache.nifi +nifi-record-serialization-service-api + + +org.apache.nifi +nifi-standard-record-utils + + +org.apache.commons +commons-csv --- End diff -- This is still used: import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVPrinter; > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205713#comment-16205713 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r144812713 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -101,6 +100,34 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +public static final PropertyDescriptor FIRST_ROW = new PropertyDescriptor +.Builder().name("excel-extract-first-row") +.displayName("First Row") +.description("The row number of the header row, or first row of data if `Has Header Line` is set to false. " ++ "Use this to skip over rows of data at the top of your worksheet that are not part of the dataset.") +.required(true) +.defaultValue("0") + .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) +.build(); + +public static final PropertyDescriptor COLUMNS_TO_SKIP = new PropertyDescriptor +.Builder().name("excel-extract-column-to-skip") +.displayName("Columns To Skip") +.description("Comma delimited list of column numbers to skip. Use the columns number and not the letter designation. " ++ "Use this to skip over columns anywhere in your worksheet that you don't want extracted as part of the record.") +.required(false) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor FORMAT_VALUES = new PropertyDescriptor.Builder() +.name("excel-format-values") +.displayName("Format Cell Values") +.description("Should the cell values be written too CSV using the formatting applied in Excel, or should they be printed as raw values.") +.allowableValues("true", "false") +.defaultValue("true") --- End diff -- Only to preserve existing flow behavior (it's possible that some user has been running a flow reading excel files with styles), `false` might be better as default value. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205712#comment-16205712 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r144808367 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/pom.xml --- @@ -77,6 +76,19 @@ org.apache.nifi +nifi-record-serialization-service-api + + +org.apache.nifi +nifi-standard-record-utils + + +org.apache.commons +commons-csv --- End diff -- `commons-csv` dependency is no longer needed. Please remove it. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205711#comment-16205711 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r144808162 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/pom.xml --- @@ -77,6 +76,19 @@ org.apache.nifi +nifi-record-serialization-service-api --- End diff -- `nifi-record-serialization-service-api` dependency is no longer required. Please remove it. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205714#comment-16205714 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r144811262 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -101,6 +100,34 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +public static final PropertyDescriptor FIRST_ROW = new PropertyDescriptor --- End diff -- If I understand the behavior correctly, this property should be named like 'Number of skip rows'. Also, we should mention that skips empty rows automatically regardless to this setting. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205561#comment-16205561 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2194 @ijokarumawak I've made the updates we talked about and it's building and running now. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201625#comment-16201625 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2194 @ijokarumawak Your code for doing the record reader was very creative. but I don't think I'm interested in using that approach, it makes me feel kind of dirty... ``` try { // Wait for the next record. consumingLatch.countDown(); readingLatch.await(); // Start consuming the record. } catch (InterruptedException e) { logger.warn("Reading Excel sheet is interrupted at nextRecord() due to {}", e); } finally { // Reset Latches. consumingLatch = new CountDownLatch(1); readingLatch = new CountDownLatch(1); } ``` I'll work on the other items you mentioned. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201618#comment-16201618 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r144222681 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -220,45 +287,43 @@ public void process(InputStream inputStream) throws IOException { * @param session * The NiFi ProcessSession instance for the current invocation. */ -private void handleExcelSheet(ProcessSession session, FlowFile originalParentFF, -SharedStringsTable sst, final InputStream sheetInputStream, String sName) throws IOException { +private void handleExcelSheet(ProcessSession session, FlowFile originalParentFF, final InputStream sheetInputStream, ExcelSheetReadConfig readConfig, + CSVFormat csvFormat) throws IOException { FlowFile ff = session.create(); try { +final DataFormatter formatter = new DataFormatter(); +final InputSource sheetSource = new InputSource(sheetInputStream); + +final SheetToCSV sheetHandler = new SheetToCSV(readConfig, csvFormat); + +final XMLReader parser = SAXHelper.newXMLReader(); +final XSSFSheetXMLHandler handler = new XSSFSheetXMLHandler( +readConfig.getStyles(), null, readConfig.getSharedStringsTable(), sheetHandler, formatter, false); --- End diff -- Thanks. I tried not passing in a DataFormatter, and that just threw a null exception. I'll try not passing in a Style Table. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201609#comment-16201609 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r143922178 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -220,45 +287,43 @@ public void process(InputStream inputStream) throws IOException { * @param session * The NiFi ProcessSession instance for the current invocation. */ -private void handleExcelSheet(ProcessSession session, FlowFile originalParentFF, -SharedStringsTable sst, final InputStream sheetInputStream, String sName) throws IOException { +private void handleExcelSheet(ProcessSession session, FlowFile originalParentFF, final InputStream sheetInputStream, ExcelSheetReadConfig readConfig, + CSVFormat csvFormat) throws IOException { FlowFile ff = session.create(); try { +final DataFormatter formatter = new DataFormatter(); +final InputSource sheetSource = new InputSource(sheetInputStream); + +final SheetToCSV sheetHandler = new SheetToCSV(readConfig, csvFormat); + +final XMLReader parser = SAXHelper.newXMLReader(); +final XSSFSheetXMLHandler handler = new XSSFSheetXMLHandler( +readConfig.getStyles(), null, readConfig.getSharedStringsTable(), sheetHandler, formatter, false); --- End diff -- It seems we can get original cell values if we don't pass a style here. I've checked XSSFSheetXMLHandler briefly and find it checks if style is specified. I haven't tried it yet, but it seems it's possible. https://github.com/apache/poi/blob/trunk/src/ooxml/java/org/apache/poi/xssf/eventusermodel/XSSFSheetXMLHandler.java#L277-L292 > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201611#comment-16201611 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r143922622 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -283,163 +348,162 @@ public void process(OutputStream out) throws IOException { } } -static Integer columnToIndex(String col) { -int length = col.length(); -int accumulator = 0; -for (int i = length; i > 0; i--) { -char c = col.charAt(i - 1); -int x = ((int) c) - 64; -accumulator += x * Math.pow(26, length - i); +/** + * Uses the XSSF Event SAX helpers to do most of the work + * of parsing the Sheet XML, and outputs the contents + * as a (basic) CSV. + */ +private class SheetToCSV implements XSSFSheetXMLHandler.SheetContentsHandler { +private ExcelSheetReadConfig readConfig; +CSVFormat csvFormat; + +private boolean firstCellOfRow; +private boolean skipRow; +private int currentRow = -1; +private int currentCol = -1; +private int rowCount = 0; +private boolean rowHasValues=false; +private int skippedColumns=0; + +private CSVPrinter printer; + +private boolean firstRow=false; + +private ArrayList fieldValues; + +public int getRowCount(){ +return rowCount; } -// Make it to start with 0. -return accumulator - 1; -} -private static class CellAddress { -final int row; -final int col; +public void setOutput(PrintStream output){ +final OutputStreamWriter streamWriter = new OutputStreamWriter(output); -private CellAddress(int row, int col) { -this.row = row; -this.col = col; +try { +printer = new CSVPrinter(streamWriter, csvFormat); +} catch (IOException e) { +throw new ProcessException("Failed to create CSV Printer.", e); +} } -} -/** - * Extracts every row from an Excel Sheet and generates a corresponding JSONObject whose key is the Excel CellAddress and value - * is the content of that CellAddress converted to a String - */ -private class ExcelSheetRowHandler -extends DefaultHandler { - -private SharedStringsTable sst; -private String currentContent; -private boolean nextIsString; -private CellAddress firstCellAddress; -private CellAddress firstRowLastCellAddress; -private CellAddress previousCellAddress; -private CellAddress nextCellAddress; -private OutputStream outputStream; -private boolean firstColInRow; -long rowCount; -String sheetName; - -private ExcelSheetRowHandler(SharedStringsTable sst) { -this.sst = sst; -this.firstColInRow = true; -this.rowCount = 0l; -this.sheetName = UNKNOWN_SHEET_NAME; +public SheetToCSV(ExcelSheetReadConfig readConfig, CSVFormat csvFormat){ +this.readConfig = readConfig; +this.csvFormat = csvFormat; } -public void setFlowFileOutputStream(OutputStream outputStream) { -this.outputStream = outputStream; +@Override +public void startRow(int rowNum) { +if(rowNum <= readConfig.getOverrideFirstRow()) { +skipRow = true; +return; +} + +// Prepare for this row +skipRow = false; +firstCellOfRow = true; +firstRow = currentRow==-1; +currentRow = rowNum; +currentCol = -1; +rowHasValues = false; + +fieldValues = new ArrayList<>(); } +@Override +public void endRow(int rowNum) { +if(skipRow) { +return; +} -public void startElement(String uri, String localName, String name, -Attributes attributes) throws SAXException { +if(firstRow){ +readConfig.setLastColumn(currentCol); +
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201610#comment-16201610 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r143921058 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/csv/CSVUtils.java --- @@ -23,46 +23,47 @@ import org.apache.nifi.components.AllowableValue; import org.apache.nifi.components.PropertyDescriptor; import org.apache.nifi.components.PropertyValue; +import org.apache.nifi.context.PropertyContext; import org.apache.nifi.controller.ConfigurationContext; --- End diff -- `ConfigurationContext` causes an unused-import CheckStyle violation. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199859#comment-16199859 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r143920555 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/test/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessorTest.java --- @@ -42,16 +42,6 @@ public void init() { } @Test -public void testColToIndex() { -assertEquals(Integer.valueOf(0), ConvertExcelToCSVProcessor.columnToIndex("A")); -assertEquals(Integer.valueOf(1), ConvertExcelToCSVProcessor.columnToIndex("B")); -assertEquals(Integer.valueOf(25), ConvertExcelToCSVProcessor.columnToIndex("Z")); -assertEquals(Integer.valueOf(29), ConvertExcelToCSVProcessor.columnToIndex("AD")); -assertEquals(Integer.valueOf(239), ConvertExcelToCSVProcessor.columnToIndex("IF")); -assertEquals(Integer.valueOf(16383), ConvertExcelToCSVProcessor.columnToIndex("XFD")); --- End diff -- Confirmed Checkstyle has no issue. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198261#comment-16198261 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2194 @ijokarumawak I made several changes, though not really what you asked for... I replied separately on data formatting. As for EL, see my comments below. I wanted to move towards generating CSV files that can be easily parsed by using the CSV Record Reader. So I exposed the CSVUtils Properties and put them on to this processor. I was having problems with comma's in my input file before, but after these changes that isn't an issue anymore. I also wanted the code to be easier to maintain, so I tried to move as much of the CSV code to the existing code in the record serializers module. That code, `CSVUtils.createCSVFormat`, does not use EL, which makes sense since it's all being defined on a Controller Service. I tried making an Excel Spreadsheet Record Reader, and I did make one, but it did not use streaming. It loaded the whole spreadsheet into memory. I couldn't find a way using the Apache POI project to do stream reading one row at a time (instead of all rows at once). > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198250#comment-16198250 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r143640150 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/test/resources/alternatedelimiters.csv --- @@ -0,0 +1,9 @@ +NumbersTimestampsMoney +1234.4561/1/17$ 123.45 +1234.4612:00:00 PM£ 123.45 +1234.5Sunday, January 01, 2017¥ 123.45 +1,234.461/1/17 12:00$ 1,023.45 +1,234.456012:00 PM£ 1,023.45 +9.88E+082017/01/01/ 12:00¥ 1,023.45 +9.877E+08 +9.8765E+08 --- End diff -- I looked around online, but the only method I could find to do this would be to create a custom copy of `XSSFSheetXMLHandler` and add in this functionality. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192591#comment-16192591 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r142872545 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/test/resources/alternatedelimiters.csv --- @@ -0,0 +1,9 @@ +NumbersTimestampsMoney +1234.4561/1/17$ 123.45 +1234.4612:00:00 PM£ 123.45 +1234.5Sunday, January 01, 2017¥ 123.45 +1,234.461/1/17 12:00$ 1,023.45 +1,234.456012:00 PM£ 1,023.45 +9.88E+082017/01/01/ 12:00¥ 1,023.45 +9.877E+08 +9.8765E+08 --- End diff -- Without style, the original value is '987654321'. I imagine there might be use-cases in which user would like to get the original value representation. Providing one more processor property to toggle whether apply style table might be more user friendly. How do you think? > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192589#comment-16192589 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r142869799 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -101,6 +97,24 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +static final PropertyDescriptor COLUMN_DELIMITER = new PropertyDescriptor.Builder() +.name("excel-csv-column-delimiter") +.displayName("Column Delimiter") +.description("Character(s) used to separate columns of data in the CSV file. Special characters should use the '\\u' notation.") +.required(false) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.defaultValue(",") +.build(); --- End diff -- Is there any reason that we don't support EL here? > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192590#comment-16192590 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r142869816 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java --- @@ -101,6 +97,24 @@ .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); +static final PropertyDescriptor COLUMN_DELIMITER = new PropertyDescriptor.Builder() +.name("excel-csv-column-delimiter") +.displayName("Column Delimiter") +.description("Character(s) used to separate columns of data in the CSV file. Special characters should use the '\\u' notation.") +.required(false) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.defaultValue(",") +.build(); + +static final PropertyDescriptor RECORD_DELIMITER = new PropertyDescriptor.Builder() +.name("excel-csv-record-delimiter") +.displayName("Record Delimiter") +.description("Character(s) used to separate rows of data in the CSV file. For line return enter \\n in this field. Special characters should use the '\\u' notation.") +.required(false) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.defaultValue("\\n") +.build(); --- End diff -- Is there any reason that we don't support EL here? > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192592#comment-16192592 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on a diff in the pull request: https://github.com/apache/nifi/pull/2194#discussion_r142864821 --- Diff: nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/test/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessorTest.java --- @@ -42,16 +42,6 @@ public void init() { } @Test -public void testColToIndex() { -assertEquals(Integer.valueOf(0), ConvertExcelToCSVProcessor.columnToIndex("A")); -assertEquals(Integer.valueOf(1), ConvertExcelToCSVProcessor.columnToIndex("B")); -assertEquals(Integer.valueOf(25), ConvertExcelToCSVProcessor.columnToIndex("Z")); -assertEquals(Integer.valueOf(29), ConvertExcelToCSVProcessor.columnToIndex("AD")); -assertEquals(Integer.valueOf(239), ConvertExcelToCSVProcessor.columnToIndex("IF")); -assertEquals(Integer.valueOf(16383), ConvertExcelToCSVProcessor.columnToIndex("XFD")); --- End diff -- Need to remove static import as assertEquals is no longer used, causing a Checkstyle violation. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192527#comment-16192527 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2194 @ijokarumawak After testing it out with a few other files I found that I really need to add an option to quote strings and escape delimiter characters when they appear in values. These are things that the CSV Record Reader already supports reading, let me add them and update. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192525#comment-16192525 ] ASF GitHub Bot commented on NIFI-4465: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/2194 @patricker Thanks! Sure, I'll review this PR. > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192466#comment-16192466 ] ASF GitHub Bot commented on NIFI-4465: -- Github user patricker commented on the issue: https://github.com/apache/nifi/pull/2194 @ijokarumawak I saw that you made some changes to this processor back in July. I ended up gutting parts of it and using other parts of the Apache POI library so I could read formatted data. The unit test you built for empty cells is passing :) If you want to review it... I wouldn't be unhappy > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4465) ConvertExcelToCSV Data Formatting and Delimiters
[ https://issues.apache.org/jira/browse/NIFI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192465#comment-16192465 ] ASF GitHub Bot commented on NIFI-4465: -- GitHub user patricker opened a pull request: https://github.com/apache/nifi/pull/2194 NIFI-4465 ConvertExcelToCSV Data Formatting and Delimiters ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [x] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [x] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/patricker/nifi NIFI-4465 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2194.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2194 commit 2d1dc6a2276af214401d7e5164f6423ea184eb35 Author: patricker Date: 2017-10-05T05:01:47Z NIFI-4465 > ConvertExcelToCSV Data Formatting and Delimiters > > > Key: NIFI-4465 > URL: https://issues.apache.org/jira/browse/NIFI-4465 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Peter Wicks >Assignee: Peter Wicks >Priority: Minor > Fix For: 1.5.0 > > > The ConvertExcelToCSV Processor does not output cell values using the > formatting set in Excel. > There are also no delimiter options available for column/record delimiting. -- This message was sent by Atlassian JIRA (v6.4.14#64029)